[OAI-implementers] Sets in OAI-PMH and DSpace

Tansley, Robert robert.tansley@hp.com
Tue, 21 Oct 2003 08:57:20 -0700


Hello,

This is related to a discussion on the OAI-implementers list about moving records in and out of sets.  The next release of DSpace will have some functionality which will affect how DSpace might implement sets.

Briefly, currently, DSpace has Communities at the top level.  Communities contain Collections, and Collections contain Items.  The structure of Communities and Collections are exposed to harvesters via the OAI-PMH set mechanism.  More details are here (see section entitled 'Sets'):

http://dspace.org/technology/system-docs/application.html#oai

The forthcoming features that affect use of OAI-PMH sets are:

1/ UI tool to add Items to more than one Collection.  This tool may also allow Items to be moved from one Collection to another.  In either case, as highlighted by the 'Moving records in and out of sets' thread on OAI-implementers, it is not clear how the OAI-PMH data provider in DSpace should deal with either event.

2/ Allowing a richer Community structure, i.e. Communities can contain other Communities.  This may not be a strict hierarchy; for example, a research project Community may be jointly run by two departments at MIT.  I don't believe this could be expressed in the set structure exposed via OAI-PMH.  Additionally, this means the set structure of DSpace at MIT (and other universities) is likely to change significantly and I'm not clear on how this should be exposed via OAI-PMH when the underlying records (Items) have not changed.

3/ A Collection may appear in more than one Community.  Again this would seem to break the 'hierarchy' constraint on the OAI set mechanism.

I can think of a couple of possible directions:

a/ Drop support for the set mechanism in DSpace completely.  It seems a shame to not expose the structure in DSpace, since I can see selective harvesting for a particular Community might be very useful for a department wanting to add a search DSpace function to their Web page/portal/etc.  However, the structure is becoming more complex than the simple hierarchy OAI-PMH allows.

b/ Expose DSpace Collections as OAI-PMH sets; these would be flat and not a hierarchy.  This would still allow some selective harvesting but harvesters would not be able to harvest by Community which intuitively seems likely to be the most useful selective harvest.  This does still expose us to the 'what happens when an Item is moved between or added to additional Collections' issue however.

Out of interest, how many people actually use sets for selective harvesting?  My feeling is that while it's not vital now, as the volume of data in systems like DSpace grows it will become increasingly useful.

Does anyone have any thoughts or suggestions?

 Robert Tansley / Hewlett-Packard Laboratories / (+1) 617 551 7624