OAI logo

Implementation Guidelines for the Open Archives Initiative Protocol for Metadata Harvesting

- Specification for an OAI Static Repository and an OAI Static Repository Gateway

Protocol Version 2.0 of 2002-06-14
Document Version 2002/11/11T00:40:00Z
http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm

alpha release. do not distribute.

Editors

The OAI Executive:
Herbert Van de Sompel <herbertv@lanl.gov> -- Los Alamos National Laboratory - Research Library
Carl Lagoze <lagoze@cs.cornell.edu> -- Cornell University - Computing and Information Science

From the OAI Technical Committee:
Michael Nelson <mln@cs.odu.edu> -- Old Dominion University - Dept of Computer Science
Simeon Warner <simeon@cs.cornell.edu> -- Cornell University - Computing and Information Science

Contributors:
Patrick Hochstenbach <hochsten@lanl.gov> -- Los Alamos National Laboratory - Research Library
Henry Jerez <hjerez@lanl.gov> -- Los Alamos National Laboratory - Research Library

This document is one part of the Implementation Guidelines that accompany the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).

1. Introduction

For an adequate comprehension of this specification, a prior understanding of the OAI-PMH is required.

The specification of a Static Repository, introduced here, provides a simple approach for exposing relatively static and small collections of metadata records through the OAI-PMH. The Static Repository approach is targeted at data providers that:

The Static Repository is an XML file that is made accessible by a data provider at a persistent network-location. The XML file has a well-defined structure and it contains information similar to that in OAI-PMH responses. This includes metadata records and supporting information required for the purpose of harvesting via the OAI-PMH.

A Static Repository becomes harvestable via the OAI-PMH through the intermediation of a Static Repository Gateway. The properties of a Static Repository Gateway are also described in this specification. A Static Repository Gateway can make one or more Static Repositories harvestable. Due to the fact that a Static Repository Gateway assigns a unique base URL to each such Static Repository, harvesters can harvest Static Repository information in exactly the same manner as they harvest any other OAI-PMH Repository.

Both the Static Repository and the Static Repository Gateway are described in the remainder of this document. They are further clarified through the accompanying figure.

Repository-gateway relationship figure
Static Repositories, a Static Repository Gateway and an OAI-PMH Harvester

2. Concepts and definitions

Some concepts that are essential for an understanding of this specification are given, below.

Static Repository: A Static Repository is an XML file that is valid according to the XML Schema provided in Section 3.2. A Static Repository contains metadata records and supporting information required for the purpose of harvesting via the OAI-PMH. A Static Repository is managed by a data provider at a persistent network-location of its choice. A Static Repository is not a OAI-PMH Repository, because it is a file, not a server that can respond to the six OAI-PMH requests.

Static Repository network-location: The network-location of a Static Repository is the HTTP address where the Static Repository is accessible.

Static Repository Gateway: A Static Repository Gateway makes one or more Static Repositories harvestable via the OAI-PMH. It assigns each such Static Repository a unique base URL and thereby exposes each individual Static Repository as an individual OAI-PMH Repository. A Static Repository Gateway complies with restrictions and conventions described in Section 4.

3. Static Repository Description

3.1 Static Repository restrictions and conventions

Static Repositories are meant for small and relatively static metadata collections. All metadata, identifiers, and datestamps are managed in a single XML file. This file may be created manually with an XML editing tool, or a text processing application. Alternatively, a Static Repository might be generated periodically by a script that extracts information from an existing database.

The following restriction and conventions apply to a Static Repository:

3.2 Static Repository format

The structure of the Static Repository is defined by means of the XML Schema shown below. Static Repositories must validate against this XML Schema.

Static Repository XML Schema

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema targetNamespace="http://www.openarchives.org/OAI/2.0/static-repository" 
           xmlns:oai="http://www.openarchives.org/OAI/2.0/" 
           xmlns:sr="http://www.openarchives.org/OAI/2.0/static-repository" 
           xmlns:xs="http://www.w3.org/2001/XMLSchema" 
           elementFormDefault="qualified" 
           attributeFormDefault="unqualified">
  <xs:import namespace="http://www.openarchives.org/OAI/2.0/" 
             schemaLocation="http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"/>
  <xs:annotation>
    <xs:documentation>This XML Schema specifies the structure of a so-called OAI-PMH Static Repository.  
     A Static Repository is an XML file that is valid according to this XML Schema.  
     A Static Repository is made accessible as an XML file on a standard web-server.  
     No special software is required at the end of the organization that makes the Static Repository
     available.  
     A Static Repository becomes harvestable via the OAI-PMH through the intermedation of a 
     Static Repository Gateway.
     The following important restrictions apply to a Static Repository: 
     (1) A Static Repository does not support OAI-PMH "sets".  
     (2) A Static Repository does not support the OAI-PMH notion of "deleted records". 
     (3) The only supported harvesting granularity is YYYY-MM-DD.  
     (4) The content of the "baseURL" element - child of the "Identify" 
         element in the XML Static Repository - must be the Static Repository network-location.
     (5) A Static Repository must not have a resumptionToken element as a child of the ListRecords 
         element in the XML Static Repository. 
     (6) the "datestamp" element of records must be of the form YYYY-MM-DD.</xs:documentation>
  </xs:annotation>
  <xs:annotation>
    <xs:documentation>This Static Repository XML Schema by Herbert Van de Sompel and Henry N. Jerez.  
      Los Alamos National Laboratory, Research Library, Digital Library Research and Prototyping Team.  
      October 26th 2002.  Inspired by the Vida work by Steven Bird for OAI-PMH v.1.0 and 
      for the Open Languages Archives Community; 
      see  http://www.language-archives.org/docs/implement.html#Vida
    </xs:documentation>
  </xs:annotation>
  <xs:element name="Repository" type="sr:repo"/>
  <xs:complexType name="repo">
    <xs:annotation>
      <xs:documentation>The Repository element has 2 child elements -- "Identify" and 
      "ListMetadataFormats" -- that are directly derived from the XML Schema defining responses 
      to OAI-PMH v.2.0 requests.  
      The third repeatable element -- "ListRecords" -- is an extension of the ListRecords
      defined in the XML Schema defining responses to OAI-PMH v.2.0 requests; it has an additional
      attribute indicating the metadataPrefix of the included metadata records.</xs:documentation>
    </xs:annotation>
    <xs:sequence>
      <xs:element name="Identify" type="oai:IdentifyType"/>
      <xs:element name="ListMetadataFormats" type="oai:ListMetadataFormatsType"/>
      <xs:element name="ListRecords" type="sr:ListRecordsType" minOccurs="1" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
  <xs:complexType name="ListRecordsType">
<xs:complexContent>
<xs:annotation>
<xs:documentation>The ListRecords element contains all records with metadata expressed to one
of the metadata formats supported by the Static Repository. The "metadataPrefix" attribute
specifies the metadataPrefix of the included metadata; it must correspond with a value
of the metadataPrefix element contained in the ListMetadataFormats element</xs:documentation>
</xs:annotation>
<xs:extension base="oai:ListRecordsType">
<xs:attribute name="metadataPrefix" type="oai:metadataPrefixType" use="required"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
</xs:schema>
This Schema is available at http://www.openarchives.org/OAI/2.0/static-repository.xsd

3.3 Making a Static Repository harvestable through a Static Repository Gateway

A Static Repository must use the intermediation of a single Static Repository Gateway to make its metadata harvestable. In order to make the existence of a Static Repository known to a chosen Static Repository Gateway, the administrator of the Static Repository must issue an OAI-PMH Identify request to the base URL at which the chosen Static Repository Gateway will make the Static Gateway harvestable. This action automatically establishes the relationship between the chosen Static Repository Gateway and the Static Repository. The administrator must construct that base URL for its own Static Repository by following the base URL convention provided in Section 4.1.

3.4 Static Repository example

Below, an example is shown of a Static Repository. As can be seen, it contains:

Static Repository example

<?xml version="1.0" encoding="UTF-8"?>
<Repository xmlns="http://www.openarchives.org/OAI/2.0/ma" 
            xmlns:oai="http://www.openarchives.org/OAI/2.0/" 
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
            xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/static-repository 
                                http://www.openarchives.org/OAI/2.0/static-repository.xsd">
  <Identify>
    <oai:repositoryName>Demo repository</oai:repositoryName>
    <oai:baseURL>http://an.oai.org/ma/mini.xml</oai:baseURL>
    <oai:protocolVersion>2.0</oai:protocolVersion>
    <oai:adminEmail>jondoe@oai.org</oai:adminEmail>
    <oai:earliestDatestamp>2002-09-19</oai:earliestDatestamp>
    <oai:deletedRecord>no</oai:deletedRecord>
    <oai:granularity>YYYY-MM-DD</oai:granularity>
  </Identify>
  <ListMetadataFormats>
    <oai:metadataFormat>
      <oai:metadataPrefix>oai_dc</oai:metadataPrefix>
      <oai:schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</oai:schema>
      <oai:metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/
          </oai:metadataNamespace>
    </oai:metadataFormat>
    <oai:metadataFormat>
      <oai:metadataPrefix>oai_rfc1807</oai:metadataPrefix>
      <oai:schema>http://www.openarchives.org/OAI/1.1/rfc1807.xsd</oai:schema>
      <oai:metadataNamespace>http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1807.txt
           </oai:metadataNamespace>
    </oai:metadataFormat>
  </ListMetadataFormats>
  <ListRecords metadataPrefix="oai_dc">
  <oai:record> 
     <oai:header>
       <oai:identifier>oai:arXiv:cs/0112017</oai:identifier> 
       <oai:datestamp>2001-12-14</oai:datestamp>
     </oai:header>
     <oai:metadata>
       <oai_dc:dc 
          xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" 
          xmlns:dc="http://purl.org/dc/elements/1.1/" 
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
          xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ 
          http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
         <dc:title>Using Structural Metadata to Localize Experience of 
                   Digital Content</dc:title> 
         <dc:creator>Dushay, Naomi</dc:creator>
         <dc:subject>Digital Libraries</dc:subject> 
         <dc:description>With the increasing technical sophistication of 
             both information consumers and providers, there is 
             increasing demand for more meaningful experiences of digital 
             information. We present a framework that separates digital 
             object experience, or rendering, from digital object storage 
             and manipulation, so the rendering can be tailored to 
             particular communities of users.
         </dc:description> 
         <dc:description>Comment: 23 pages including 2 appendices, 
             8 figures</dc:description> 
         <dc:date>2001-12-14</dc:date>
       </oai_dc:dc>
     </oai:metadata>
   </oai:record>  
   <oai:record>
     <oai:header>
       <oai:identifier>oai:perseus:Perseus:text:1999.02.0084</oai:identifier>
       <oai:datestamp>2002-05-01</oai:datestamp>
     </oai:header>
     <oai:metadata>
       <oai_dc:dc 
           xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" 
           xmlns:dc="http://purl.org/dc/elements/1.1/" 
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
           xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ 
           http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
         <dc:title>Opera Minora</dc:title>
         <dc:creator>Cornelius Tacitus</dc:creator>
         <dc:type>text</dc:type>
         <dc:source>Opera Minora. Cornelius Tacitus. Henry Furneaux. 
          Clarendon Press. Oxford. 1900.</dc:source>
         <dc:language>latin</dc:language>
         <dc:identifier>http://www.perseus.tufts.edu/cgi-bin/ptext?
           doc=Perseus:text:1999.02.0084</dc:identifier>
       </oai_dc:dc>
     </oai:metadata>
   </oai:record>
   <oai:record>
     <oai:header>
       <oai:identifier>oai:perseus:Perseus:text:1999.02.0083</oai:identifier>
       <oai:datestamp>2002-05-01</oai:datestamp>
     </oai:header>
     <oai:metadata>
       <oai_dc:dc 
           xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" 
           xmlns:dc="http://purl.org/dc/elements/1.1/" 
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
           xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ 
           http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
         <dc:title>Germany and its Tribes</dc:title>
         <dc:creator>Tacitus</dc:creator>
         <dc:type>text</dc:type>
         <dc:source>Complete Works of Tacitus. Tacitus. Alfred John Church. 
          William Jackson Brodribb. Lisa Cerrato. edited for Perseus. 
          New York: Random House, Inc. Random House, Inc. reprinted 1942.
           </dc:source>
         <dc:language>english</dc:language>
         <dc:identifier>http://www.perseus.tufts.edu/cgi-bin/ptext?
          doc=Perseus:text:1999.02.0083</dc:identifier>
       </oai_dc:dc>
       </oai:metadata>
     <oai:about>
       <provenance 
        xmlns="http://www.openarchives.org/OAI/2.0/provenance/" 
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
        xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance/
        http://www.openarchives.org/OAI/2.0/provenance/oai_provenance.xsd">
         <originDescription>
           <baseURL>http://an.oa.org</baseURL>
           <identifier>oai:r1:plog/9801001</identifier>
           <datestamp>2001-08-13T13:00:02Z</datestamp>
           <metadataPrefix>oai_dc</metadataPrefix>
           <harvestDate>2001-08-15T12:01:30Z</harvestDate>
         </originDescription>
         <originDescription>
           <baseURL>http://the.oa.org</baseURL>
           <identifier>oai:r2:klik001</identifier>
           <datestamp>2002-01-01</datestamp>
           <metadataPrefix>oai_dc</metadataPrefix>
           <harvestDate>2002-02-02T14:10:02Z</harvestDate>
         </originDescription>
       </provenance>
     </oai:about>
   </oai:record>
  </ListRecords>
  <ListRecords metadataPrefix="oai_rfc1807">
   <oai:record>
     <oai:header>
       <oai:identifier>oai:arXiv:hep-th/9901001</oai:identifier>
       <oai:datestamp>1999-12-25</oai:datestamp>
     </oai:header>
     <oai:metadata>
      <rfc1807 xmlns=
         "http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1807.txt" 
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
       xsi:schemaLocation=
        "http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1807.txt
         http://www.openarchives.org/OAI/1.1/rfc1807.xsd">
         <bib-version>v2</bib-version>
         <id>hep-th/9901001</id>
         <entry>January 1, 1999</entry>
         <title>Investigations of Radioactivity</title>
         <author>Ernest Rutherford</author>
         <date>March 30, 1999</date>
      </rfc1807>
     </oai:metadata>
     <oai:about>
       <oai_dc:dc 
           xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" 
           xmlns:dc="http://purl.org/dc/elements/1.1/" 
           xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
           xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ 
           http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
         <dc:publisher>Los Alamos arXiv</dc:publisher>
         <dc:rights>Metadata may be used without restrictions as long as 
            the oai identifier remains attached to it.</dc:rights>
       </oai_dc:dc>
     </oai:about>
   </oai:record>
</ListRecords>
</Repository>

4. Static Repository Gateway description

A Static Repository Gateway provides access via OAI-PMH to the data provided by one or more Static Repositories. This section specifies the implementation and OAI-PMH protocol issues in creating and maintaining a Static Repository Gateway.

4.1 Assigning a unique base URL for each Static Repository

A Static Repository Gateway must assign a unique base URL to each Static Repository that it makes accessible via the OAI-PMH. In accordance with the OAI-PMH, such a unique base URL assigned to a Static Repository becomes the target for OAI-PMH requests directed at that Static Repository.

The base URL that a Static Repository Gateway assigns to a Static Repository must be constructed as a concatenation of :

For instance, if the Static Repository Gateway is at http://gateway.institution.org/oai/ and, if a Static Repository network-location is http://an.oai.org/ma/mini.xml, then the Static Repository Gateway must make the given Static Repository harvestable at base URL http://gateway.institution.org/oai/an.oai.org/ma/mini.xml . Also, the Static Repository Gateway must use this value as the content of the "baseURL" element used in the response to the Identify request issued against the given Static Repository.

4.2 Responding to OAI-PMH requests

In order to guarantee the accuracy of the exposed metadata, a Static Repository Gateway must use the most recent version of a Static Repository. This may require the Static Repository Gateway to fetch a Static Repository from its persistent Static Repository network-location for every incoming harvesting request. However, a Static Repository Gateway may optimize its performance by caching Static Repositories. In that case a Static Repository Gateway must perform a freshness-test on the cached Static Repository by comparing it with the version at the Static Repository network-location before responding to harvesting requests. It should do so by using a HTTP GET with an If-Modified-Since header that contains the date of the cached version of a Static Repository.

Given the above freshness requirements, the following scenarios can occur:

4.3 OAI-PMH Identify response provided by a Static Repository Gateway

To support dynamic discovery of Static Repositories, the Static Repository Gateway must include a friends description in the Identify response that it provides for each of the Static Repositories that it makes harvestable via the OAI-PMH. That friends description must list the base URLs of all Static Repositories that are harvestable through the Static Repository Gateway. A Static Repository Gateway may develop a policy to periodically refresh the friends description, for instance based on tracking the (un)availability of listed Static Repositories over time.

The Identify response that a Static Repository Gateway provides for each Static Repository that it makes harvestable must include a gateway description, which effectively reveals that the OAI-PMH response originates through a Static Repository Gateway. The content of that gateway description must be as follows:

An example of a response to the Identify request issued against the Static Repository shown in Section 3.4, obtained through the intermediation of a Static Repository Gateway with network-location http://gateway.institution.org/oai/ is shown below:

Response to the Identify request issued against a Static Repository

<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" 
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
         http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
  <responseDate>2002-02-08T12:00:01Z</responseDate>
  <request verb="Identify">http://gateway.institution.org/oai/an.oai.org/ma/mini.xml</request>
   <Identify>
    <repositoryName>Demo repository</repositoryName>
    <baseURL>http://gateway.institution.org/oai/an.oai.org/ma/mini.xml</baseURL>
    <protocolVersion>2.0</protocolVersion>
    <adminEmail>jondoe@oai.org</adminEmail>
    <earliestDatestamp>2002-09-19</earliestDatestamp>
    <deletedRecord>no</deletedRecord>
    <granularity>YYYY-MM-DD</granularity>
   <description>
      <friends 
          xmlns="http://www.openarchives.org/OAI/2.0/friends/" 
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/friends/
         http://www.openarchives.org/OAI/2.0/friends.xsd">
       <baseURL>http://gateway.institution.org/oai/site1.org/mini/file1</baseURL>
       <baseURL>http://gateway.institution.org/oai/loca.org%3A8080/data</baseURL>
       <baseURL>http://gateway.institution.org/oai/univ.edu/lib/pubs.xml</baseURL>
     </friends>
   </description>
   <description>
      <gateway 
          xmlns="http://www.openarchives.org/OAI/2.0/gateway/" 
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/gateway/
         http://www.openarchives.org/OAI/2.0/gateway.xsd">
         <source>http://an.oai.org/ma/mini.xml</source>
         <gatewayType>Static Repository Gateway</gatewayType>
         <gatewayDescription>
            http://www.openarchives.org/OAI/2.0/guidelines-static-repository.htm
            </gatewayDescription>
          <gatewayAdmin>pat@institution.org</gatewayAdmin>
          <gatewayURL>http://gateway.institution.org/oai/<gatewayURL>
      </gateway>
   </description>
</Identify>
</OAI-PMH>

4.4. Security Considerations

The following security issues require attention when operating a Static Repository Gateway:

Acknowledgements

Support for the development of the OAI-PMH and for other Open Archives Initiative activities comes from the Digital Library Federation, the Coalition for Networked Information, and from the National Science Foundation through Grant No. IIS-9817416. Individuals who have played a significant role in the development of OAI-PMH version 2.0 are acknowledged in the protocol document.

This Static Repository specification is inspired on the ViDa (Virtual Data Provider) work by Steven Bird and Gary Simons for the Open Language Archives Community (OLAC), a leading community of implementers of the OAI-PMH. Special thanks to Steven Bird <sb@cs.mu.oz.au>and Gary Simons <Gary_Simons@sil.org>.

Document History

2002-11-11: Alpha release.
2002-09-28: Initial pre-release of this document to core OAI team.