[OAI-implementers] TLDP document repository

Tim Brody tdb01r@ecs.soton.ac.uk
Mon, 26 Jan 2004 15:01:31 -0000


OAI-PMH is primarily a 'discovery' protocol. It isn't necessary (or perhaps
desirable) to export your full-texts within the OAI-PMH framework.

The usual use of OAI-PMH is to provide metadata records that point to the
full resource (for Dublin Core use dc.identifier). You can provide the
full-text as a parallel metadata format, but I would suggest providing a DC
service to begin with, and if there is demand provide access to the
full-text.

OAI-PMH requires some kind of CGI script to serve the XML data, in response
to URI queries. There are tools available in PHP, Java, Perl, Python from
the Web site - http://www.openarchives.org/. An alternative is to use
'oai-static', which is a single XML file containing all of the data normally
associated with OAI-PMH which might be more appropriate to the LDP - but I'm
not sure how much support there is out there for static repositories.

All the best,
Tim Brody

----- Original Message ----- 
From: "Emma Jane Hogbin" <emmajane@xtrinsic.com>
>
> I'm a volunteer with The Linux Documentation Project <www.tldp.org>. We
> currently host in the range of 200 books and articles about Linux. Many of
> which are used in classrooms as textbooks, and by system administrators,
> regular people, etc. :) In some cases the LDP documents are actually the
> official documentation for specific open source projects. The documents
> are stored in either DocBook XML, DocBook SGML or LinuxDoc (which is
SGML).
> The LDP publishes PS, text, HTML and PDF versions of the documents --
> the source XML/SGML files are also available to anyone who would like
them.
>
> I think it would make sense for the LDP to submit its repository to the
> OAI. The first step will be to get our meta-data in order (and make sure
> all of the documents validate, which they /should/). The following is the
> proposed list of elements (from the DocBook DTD) which will be required
> for all publications:
>
> - title
> - authorgroup or author (or authorcorp for organizations)
> - pubdate in the format of YYYY-MM-DD (ISO standard for dates)
> - revhistory including at least one revision with:
> <revision>
> <revnumber></revnumber>
> <date></date>
> <authorinitials></authorinitials>
> <revremark></revremark>
> </revision>
> - legalnotice and/or license (License is REQUIRED and must be one of GDFL,
>   Creative Commons, or LDP License)
> - email where the author can be reached.
> - abstract
> - copyright notice
> - acknowledgements (optional)
> - other credits (optional)
> - disclaimer (optional)
>
> What other meta-data information would we need to provide? I'm assuming
> most of this can be paired up to the DublinCore, but I'm not sure if I'm
> missing any other requirements for the DublinCore.
>
> Also, would we have a harvester crawl the site, or would we provide a
> single XML file with a summary of all the docs in the collection?
>
> Thanks!
> emma
>
> -- 
> Emma Jane Hogbin
> [[ 416 417 2868 ][ www.xtrinsic.com ]]
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>