[OAI-implementers] Automatically gathering the full-text of eprints
Wed, 17 Mar 2004 21:17:54 +0000
We've done a preliminary implementation of this at Southampton for:
It took me about an hour to do, I suspect Chris did it in much less time :-)
All the best,
Andy Powell wrote:
> The JISC-funded ePrints UK project has a requirement to automatically
> harvest both metadata and full-text from the eprint archives within UK
> academia (and potentially elsewhere). This is so that we can pass both
> metadata and full-text to the various 'enhancement' Web services offered
> by our partners.
> In order for our harvesting robot to be able to do this, it must be able
> to reliably (and automatically) determine the correct URL(s) for the
> various full-text manifestation(s) (HTML, PDF, RTF, etc.) of each eprint.
> Our "Using simple Dublin Core to describe eprints" guidelines are intended
> to encourage greater consistency in the metadata that is exposed by eprint
> archives using the 'oai_dc' format within the OAI Protocol for Metadata
> Harvesting (OAI-PMH). Somewhat perversely, because we stick rigidly to
> the semantics of the DC element set, our guidelines make determining the
> URL of each manifestation that is available quite difficult. (This is
> largely a consequence of the 'simple' nature of 'simple DC'!). In
> general, the URL in the <dc:identifier> element of the oai_dc record is
> the URL of a jump-off page, rather than a direct link to the full-text.
> We would like to suggest a new proposal for unambiguously embedding the
> URL for each manifestation of an eprint into the (X)HTML jump-off page for
> that eprint. Since the jump-off page is generated automatically by the
> eprint archive software, doing this shouldn't be too difficult (in fact,
> we would hope that archive software, such as eprints.org, will be
> configured to do this out of the box).
> If this proposal is adopted, it will make it much easier to write OAI
> service provider software that can reliably gather the full-text of an
> eprint, given only the oai_dc record for that eprint.
> The proposal is at
> Comments are welcome,
> Distributed Systems, UKOLN, University of Bath, Bath, BA2 7AY, UK
> http://www.ukoln.ac.uk/ukoln/staff/a.powell/ +44 1225 383933
> Resource Discovery Network http://www.rdn.ac.uk/
> ECDL 2004, Bath, UK - 12-17 Sept 2004 - http://www.ecdl2004.org/
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe: