[OAI-implementers] returning *data* (as opposed to metadata)

Herbert Van de Sompel herbertv@CS.Cornell.EDU
Thu, 26 Jul 2001 09:34:23 -0400 (EDT)

hi Andy,

On Wed, 25 Jul 2001, Andy Powell wrote:

> On Wed, 25 Jul 2001, herbert van de sompel wrote:
> > But, obviously, if a data provider has full-content in a well-structured
> > format, he may as well consider exposing references in a preparsed
> > manner as yet another separate "information packet": doing so would lead
> > to 3 "metadata-formats" related to the same content: metadata, data,
> > references.
> Hmmm... the protocol is named the 'The Open Archives Initiative Protocol
> for Metadata Harvesting".  Using it to harvest "Data" seems reasonably
> non-intuitive to me, given this name.
> I have no problem with the notion that one person's metadata is another
> person's data.  It is a pretty extreme view to say that data in the form
> of the full content of an article is metadata - which is what you appear
> to be saying above?
> I don't disagree that the protocol *could* be used to harvest data - I
> just wonder if it *should* be used in that way.  Particularly at this
> stage in the life of the protocol?

I think it is important to emphasize here that so far I felt we had a
brainstorming session going on about the possibility of using the protocol
to harvest other types of content than only discovery metadata:  

* There was a posting about an idea to expose full-content.  This is an
idea that has actually come up several times before, and has even been
suggested by one or two publishers.  Their idea was that they wouldn't
object at all to expose an ASCII version of a full journal article using
the protocol, to allow service providers to index full-content for better 
discovery of their full-fledged full content.  

* There was a posting by me in which I shared the fact that in a few
presentations at conferences I have been "speculating" about the usage of
the OAI protocol for the harvesting of metadata of which the core purpose
is not discovery.  Actually, at the March Geneva meeting on peer-review we
had a workshop concentrating on the question whether and how certification
metadata could be harvested using the protocol.  At IATUL in Delft, Mr.
Steenbakker from the Royal Library in the Netherlands -- one of the few
world experts in digital deposit systems and digital preservation --  came
to see me after the session in which we both had a presentation.  For the
Dutch deposit system for digital materials, he is working within the
context of the Open Archival Information Systems specifications (as many
Europen national libraries do).  Mr. Steenbakker was inspired by the
presentation on the OAI protocol, and felt the protocol could potentially
play a significant role for syncing preservation-related metadata between
the several components of an OAIS environment.          

So, this is a long expose, basically to say that one can not keep people
from thinking and specualting about usages of the protocol beyond what
its orginal target area was when the OAI technical group created it.
Also, I don't see how other usages could be "illegalized", as long as they
remain compliant with the existing technical specifications.  

Having said all of that, I do agree with Andy's implicit suggestion that
the focus of our attention should remain in the discovery-area.  It is
there that we are looking for proof of concepts regarding the 
applicability of the protocol.  But, again, we can't keep peeople from
thinking and exploring beyond that.  

> Can someone clarify the differences/advantages of harvesting data directly
> using OAI vs. harvesting metadata using OAI followed by harvesting data
> using HTTP based on the URL in the metadata?

I think that the elegance of such an approach becomes evident only when
thinking of container-like content structures (for instance Michael
Nelson's buckets) in which a single "key" can be used to access many
different types of data: different types of metadata, references,
full-text, ...

Which brings me back to yesterday's METS posting.  From a quick reading I
haven't been able to figure out whether a unique key for a complete METS
package exists, i.e. I haven't been able to figure out what the glue is
that keeps descriptive meta, admin meta, file groups and strcutural map
together.  If there is such a key, then probably the OAI protocol could be
used to get to individual sections of a METS package.

herbert van de sompel (sorry for the lengthy mail)