[OAI-implementers] Re: OAI sets as new instances (Sets Proposal (from DLF))

Fri May 6 12:20:44 EDT 2005

A somewhat late response to this thread...

I think that they key to avoiding sets being misused as a poor man's
search is education. We must keep repeating the mantra that this is not
what OAI-PMH was designed to do and if search is desired then SRU/SRW or
similar should be used. Shirley Hyatt and Jeff Young have demonstrated
OAI-PMH and SRU/SRW playing nicely together at OCLC
(http://www.dlib.org/dlib/march05/hyatt/03hyatt.html)

The problem of items moving out of sets is a real one, and one we need to
address within the protocol.

Having said this, there may be situations where Robert's suggestion makes
sense. However, just as with sets, I suspect deployment should be
judicious -- to meet real selective harvesting needs.  I note that one
additional requirement for implementation is that items MUST be identified
using a recognized global URI scheme. Unless that is the case then
harvesters should assume that ids within OAI-PMH responses are local, and
then multiple repositories with the same content would be confusing.

Cheers,
Simeon

On Mon, 25 Apr 2005, Dr Robert Sanderson wrote:
> On Fri, 22 Apr 2005, Thomas G. Habing wrote:
> > time articulating.  Perhaps the problem is that there are several different
> > issues with sets, and I'm not sure which of these we are really trying to
> > address.
> >
> > 1) The tendency of people to misunderstand sets as a sort of poor man's
> > search.
>
> I think that by moving the set name into the URL it doesn't get rid of
> this, but it does lessen the tendancy to think this way.  When it's a
> parameter in the query, it's easy to cram any arbitrary value in there.
> It's less intuitive to do this when the set name is part of the URL.
>
> > 2) Technical issues relating to how to signal that a record has been moved
> > out of a set, but has not been deleted from the repository.
>
> This wasn't something I was thinking of when writing it up, but it does
> fall out neatly from the proposal -- you simply set them deleted in the
> set repository.
>
> > 3) How best to describe a set: there is a technical description such as how
> > many items are in the set and what the updated frequency is.  There is also
> > the conceptual description, such as the records in this set are all described
> > by this subject heading, or they all belong to this "collection," or they all
> > have this publishing status.
>
> The advantage here is that you have all of the best practices and schemas
> for the Identify verb for the set descriptions. What exactly
> to put in here is still in need of work, but I think it's a good start to
> allow the full Identify information.
>
> > 4) Issues such as whether its a good idea to have overlapping sets, flat
> > sets, hierarchical sets, and in which circumstances.
>
> Whether it's a good idea? I'm not going to comment on that, besides the
> point that there are heirarchical collections and sub-collections, so it's
> natural to describe these in a hierarchical tree of sets.
> The main advantage here is that everything falls out neatly -- if you want
> a tree, then design your URLs to be a tree.  If you want overlapping, flat
> or any other design, then it's up to the design of the URL paths, not the
> protocol to try and fit all of the requirements.
>
>
> > 5) Variations in how different implementers have interpretted the OAI
> > "data model".
>
> I don't think that the proposal addresses this.
>
> > Briefly some of my misgivings:
> > Does Rob's model place an excessive burden on data providers, or service
> > providers?
>
> The burden on the data providers can be done in at least two different
> ways -- either multiple instances of the script, or one server which
> handles everything.  Multiple instances is easier than the status quo (no
> sets, no extra URLs).  One server is as hard as the status quo, but
> depending on the underlying architecture it may be no more difficult, or
> it may be quite a bit harder (at which point, there's always multiple
> instances of the server code)
>
> For service providers, it should be easier, as they can simply follow the
> links in the <friends> section, rather than having to construct parameters
> from the listSets response.
>
>
> > Does it fundamentally alter the underlying data model of OAI, for better or
> > worse?  Previously, I think that items belonged to one or more sets, and
> > records were disseminations of these items in a specific format.  I think
> > Rob's model alters this to something like records being disseminations of
> > items within the context of those items being contained in a particular set.
>
> Mmmmm. I have no real comment here.  There's nothing to prevent you from
> having different representations of the same object disseminated in
> different sets, but that's no different to today where some providers make
> sets available per record schema.
>
> I think that's a best practice issue which should be addressed, but is
> mostly orthogonal to the proposal?
>
> Rob
>
>        ,'/:.          Dr Robert Sanderson (azaroth at liverpool.ac.uk)
>      ,'-/::::.        http://www.csc.liv.ac.uk/~azaroth/
>    ,'--/::(@)::.      Dept. of Computer Science, Room 805
> ,'---/::::::::::.    University of Liverpool
> ____/:::::::::::::.
> I L L U M I N A T I  Cheshire3 IR System:  http://www.cheshire3.org/
>
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://www.openarchives.org/mailman/listinfo/oai-implementers
>