[OAI-implementers] returning *data* (as opposed to metadata)
Thu, 26 Jul 2001 18:08:47 +0100
>Date: Wed, 25 Jul 2001 23:09:23 +0100 (BST)
>From: Andy Powell <firstname.lastname@example.org>
>To: herbert van de sompel <email@example.com>
>Subject: Re: [OAI-implementers] returning *data* (as opposed to metadata)
>I don't disagree that the protocol *could* be used to harvest data - I
>just wonder if it *should* be used in that way. Particularly at this
>stage in the life of the protocol?
This was my concern, - I can't see why returning full data would be
bad, in fact it makes a lot of sense from what Michael and Donna have said,
but it's obviously not quite the intention of the original protocol. Maybe
all that means is that the OAI should redefine itself as promoting exchange
of metadata *and* data ... But I wanted to discuss the implications before
deciding unilaterally that I would start doing weird things with the
>Can someone clarify the differences/advantages of harvesting data directly
>using OAI vs. harvesting metadata using OAI followed by harvesting data
>using HTTP based on the URL in the metadata?
One reason might be that the data is available in multiple formats.
In our case, the URL used as an identifier is a link to an HTML article
which is rendered from XML. This version looks a lot better to humans and
the URL is, we think, the appropriate identifier for the article, but
obviously the HTML wouldn't be so suitable for processing as the XML
version. We also have PDFs.
Now we could provide multiple identifier URLs in the oai_dc record
to allow harvesting that way, I suppose - or is this a valid thing to do? It
seems to be allowed by the OAI Dublin Core schema:
<element name="identifier" minOccurs="0" maxOccurs="unbounded"
but I seem to remember getting the impression from somewhere that you
should only have one identifier. Could someone clarify this?
Ben Henley <mailto:firstname.lastname@example.org>