[OAI-implementers] Sets in OAI-PMH and DSpace

Hussein Suleman hussein@cs.uct.ac.za
Tue, 21 Oct 2003 19:04:21 +0200


here are my thoughts on this ...

hypothetically, how about if think of the OAI-PMH as a transaction 
distribution framework. i.e., for each data provider to maintain every 
archive change of state as a transaction and then translate these into 
instantaneous views of the archive whenever OAI-PMH requests arrive. 
(somewhat db-like i guess)

now, while i know lots of people would like a deterministic solution to 
the "problem", i think complete solutions such as what i just outlined 
go against the fundamental philosophy that OAI-PMH should be easy for 
data providers to implement (and possibly requiring more effort from 
service providers).

service providers SHOULD be willing to sift through records to find what 
they need - i am sure this was discussed many moons ago ... that said, 
NDLTD currently uses the approach that Robert has suggested as an option 
- no set structure at all! simply because the amount of work necessary 
to come up with and manage meaningful sets is non-trivial and it is so 
much more important to share the data in the first place.

if the sets are not incredibly obvious (e.g., a thesis will always be a 
thesis!) i would not use sets - instead encode what is essentially 
"attribute" information into your metadata directly (in whatever formats 
you support) so service providers can sift through records on the basis 
of this information.

please let us stay away from any practices that make OAI-PMH 
implementation more difficult for data providers.


Tansley, Robert wrote:

> Hello,
> This is related to a discussion on the OAI-implementers list about moving records in and out of sets.  The next release of DSpace will have some functionality which will affect how DSpace might implement sets.
> Briefly, currently, DSpace has Communities at the top level.  Communities contain Collections, and Collections contain Items.  The structure of Communities and Collections are exposed to harvesters via the OAI-PMH set mechanism.  More details are here (see section entitled 'Sets'):
> http://dspace.org/technology/system-docs/application.html#oai
> The forthcoming features that affect use of OAI-PMH sets are:
> 1/ UI tool to add Items to more than one Collection.  This tool may also allow Items to be moved from one Collection to another.  In either case, as highlighted by the 'Moving records in and out of sets' thread on OAI-implementers, it is not clear how the OAI-PMH data provider in DSpace should deal with either event.
> 2/ Allowing a richer Community structure, i.e. Communities can contain other Communities.  This may not be a strict hierarchy; for example, a research project Community may be jointly run by two departments at MIT.  I don't believe this could be expressed in the set structure exposed via OAI-PMH.  Additionally, this means the set structure of DSpace at MIT (and other universities) is likely to change significantly and I'm not clear on how this should be exposed via OAI-PMH when the underlying records (Items) have not changed.
> 3/ A Collection may appear in more than one Community.  Again this would seem to break the 'hierarchy' constraint on the OAI set mechanism.
> I can think of a couple of possible directions:
> a/ Drop support for the set mechanism in DSpace completely.  It seems a shame to not expose the structure in DSpace, since I can see selective harvesting for a particular Community might be very useful for a department wanting to add a search DSpace function to their Web page/portal/etc.  However, the structure is becoming more complex than the simple hierarchy OAI-PMH allows.
> b/ Expose DSpace Collections as OAI-PMH sets; these would be flat and not a hierarchy.  This would still allow some selective harvesting but harvesters would not be able to harvest by Community which intuitively seems likely to be the most useful selective harvest.  This does still expose us to the 'what happens when an Item is moved between or added to additional Collections' issue however.
> Out of interest, how many people actually use sets for selective harvesting?  My feeling is that while it's not vital now, as the volume of data in systems like DSpace grows it will become increasingly useful.
> Does anyone have any thoughts or suggestions?
>  Robert Tansley / Hewlett-Packard Laboratories / (+1) 617 551 7624 
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers

hussein suleman ~ hussein@cs.uct.ac.za ~ http://www.husseinsspace.com