[OAI-implementers] returning *data* (as opposed to metadata)

Michael L. Nelson mln@ils.unc.edu
Thu, 26 Jul 2001 13:24:09 -0400 (EDT)


> 	This was my concern,  - I can't see why returning full data would be
> bad, in fact it makes a lot of sense from what Michael and Donna have said,
> but it's obviously not quite the intention of the original protocol. Maybe
> all that means is that the OAI should redefine itself as promoting exchange
> of metadata *and* data ... But I wanted to discuss the implications before
> deciding unilaterally that I would start doing weird things with the
> protocol.

doing weird things with the protocol is encouraged ;-)

I see no problem in having the ability to return XML data through OAI.  It
won't work so well with, say, PDF data (or HDF, or MPEG, or ...), but if
it makes sense in your context, do it.  actually, we're all looking to you
to do it and tell us what the issues are!  

> 
> 
> >Can someone clarify the differences/advantages of harvesting data directly
> >using OAI vs. harvesting metadata using OAI followed by harvesting data
> >using HTTP based on the URL in the metadata?
> 

I can't think of a *strong* reason why the data would have to come through
OAI...  unless you can imagine scenarios where the "metadata" and
"data" change independently and at different rates  (sensor data maybe?)

>   One reason might be that the data is available in multiple formats.
>   In our case, the URL used as an identifier is a link to an HTML article
> which is rendered from XML. This version looks a lot better to humans and
> the URL is, we think, the appropriate identifier for the article, but
> obviously the HTML wouldn't be so suitable for processing as the XML
> version. We also have PDFs. 
> 	Now we could provide multiple identifier URLs in the oai_dc record
> to allow harvesting that way, I suppose - or is this a valid thing to do? It
> seems to be allowed by the OAI Dublin Core schema:
> 
> 	<element name="identifier"  minOccurs="0" maxOccurs="unbounded"
> type="string"/>
> 
>  but I seem to remember getting the impression from somewhere that you
> should only have one identifier. Could someone clarify this?

I can recall no restrictions to this...  it certainly makes things easier
for the SP if you have a single "wrapper" URI that encompasses all
formats, but that's not required.

wrappers, containers...  I'll take this opportunity to make a self-serving
plug for "buckets" ;-)

http://www.dlib.org/dlib/february01/nelson/02nelson.html

they may or may not do what you want...

regards,

Michael

> 
> Ben Henley <mailto:ben@biomedcentral.com>                    
> Usability Engineer
> BioMed Central    
> http://www.biomedcentral.com
> 
> 
> 
> 
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> 

---
Michael L. Nelson			
207 Manning Hall, School of Information and Library Science
University of North Carolina 		mln@ils.unc.edu
Chapel Hill, NC 27599			http://ils.unc.edu/~mln/
+1 919 966 5042				+1 919 962 8071 (f)