![]() |
The Open Archives Initiative Protocol for Metadata Harvesting |
Protocol Version 2.0 of 2002-06-14
|
|
Previous version: Protocol
Version 1.1 of 2001-07-02
|
Editors
The OAI Executive:
Carl
Lagoze <lagoze@cs.cornell.edu
> -- Cornell University - Computer
Science
Herbert Van de
Sompel <herbertv@lanl.gov > --
Los Alamos National Laboratory - Research
Library
From the OAI Technical Committee:
Michael Nelson <m.l.nelson@larc.nasa.gov > -- NASA - Langley Research Center
Simeon Warner <simeon@cs.cornell.edu > -- Cornell University - Computer Science
Table of Contents
1.
Introduction
2.
Definitions and Concepts
2.1.
Harvester
2.2.
Repository
2.3.
Item
2.4.
Unique Identifier
2.5.
Record
2.6.
Set
2.7.
Selective Harvesting
2.7.1
Selective Harvesting and
Datestamps
2.7.2
Selective Harvesting and Sets
3.
Protocol Features
3.1.
HTTP Embedding of OAI-PMH requests
3.1.1.
HTTP Request Format
3.1.2.
HTTP Response Format
3.1.3.
Response Compression
3.2.
XML Response Format
3.2.1.
XML Schema for Validating Responses to OAI-PMH Requests
3.3.
UTCdatetime
3.3.1.
UTCdatetime in Protocol Requests
3.3.2.
UTCdatetime in Protocol Responses
3.4.
metadataPrefix and Metadata Schema
3.5.
Flow Control
3.5.1
Idempotency of resumptionTokens
3.6.
Error and Exception Conditions
4.
Protocol Requests and Responses
4.1.
GetRecord
4.2.
Identify
4.3.
ListIdentifiers
4.4.
ListMetadataFormats
4.5.
ListRecords
4.6.
ListSets
5.
Dublin Core
6.
Implementation Guidelines
Acknowledgements
Document
History
The Open Archives Initiative Protocol for Metadata Harvesting (referred to as the OAI-PMH in the remainder of this document) provides an application-independent interoperability framework based on metadata harvesting. There are two classes of participants in the OAI-PMH framework:
In this document the key words "must", "must not", " required", "shall", "shall not", "should", " should not", "recommended", "may", and "optional " in bold face are to be interpreted as described in RFC 2119 . An implementation is not conformant if it fails to satisfy one or more of the "must" or "required" level requirements for the protocols it implements.
This document refers in several places to "community-specific" practices to which individual protocol implementations may conform. These practices are described in an accompanying Implementation Guidelines document.
A harvester is a client application that issues OAI-PMH requests. A harvester is operated by a service provider as a means of collecting metadata from repositories .
A repository is a network accessible server that can process the 6 OAI-PMH requests in the manner described in this document. A repository is managed by a data provider to expose metadata to harvesters . To allow various repository configurations, the OAI-PMH distinguishes between three distinct entities related to the metadata made accessible by the OAI-PMH.
A unique identifier unambigiously identifies an item within a repository; the unique identifier is used in OAI-PMH requests for extracting metadata from the item. Items may contain metadata in multiple formats . The unique identifier maps to the item, and all possible records available from a single item share the same unique identifier.
The format of the unique identifier must correspond to that of the URI (Uniform Resource Identifier) syntax. Individual communities may develop community-specific URI schemes for coordinated use across repositories. The scheme component of the unique identifiers must not correspond to that of a recognized URI scheme unless the identifiers conform to that scheme. Repositories may implement the oai-identifier syntax described in the accompanying Implementation Guidelines document.
Unique identifiers play two roles in the protocol:
Note that the identifier described here is not that of a resource . The nature of a resource identifier is outside the scope of the OAI-PMH. To facilitate access to the resource associated with harvested metadata, repositories should use an element in metadata records to establish a linkage between the record (and the identifier of its item) and the identifier (URL, URN, DOI, etc.) of the associated resource. The mandatory Dublin Core format provides the identifier element that should be used for this purpose.
The following example shows an XML-encoding of a record and its components:
<header>
<identifier>oai:arXiv:cs/0112017</identifier>
<datestamp>2002-02-28</datestamp>
<setSpec>cs</setSpec>
<setSpec>math</setSpec>
</header>
<metadata>
<oai_dc:dc
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Using Structural Metadata to Localize Experience of Digital
Content</dc:title>
<dc:creator>Dushay, Naomi</dc:creator>
<dc:subject>Digital Libraries</dc:subject>
<dc:description>With the increasing technical sophistication of both
information consumers and providers, there is increasing demand for
more meaningful experiences of digital information. We present a
framework that separates digital object experience, or rendering,
from digital object storage and manipulation, so the
rendering can be tailored to particular communities of users.
</dc:description>
<dc:description>Comment: 23 pages including 2 appendices,
8 figures</dc:description>
<dc:date>2001-12-14</dc:date>
<dc:type>e-print</dc:type>
<dc:identifier>http://arXiv.org/abs/cs/0112017</dc:identifier>
</oai_dc:dc>
</metadata>
<about>
<provenance
xmlns="http://www.openarchives.org/OAI/2.0/provenance/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/provenance/
http://www.openarchives.org/OAI/2.0/provenance/oai_provenance.xsd">
<originDescription>
<baseURL>http://the.oa.org</baseURL>
<identifier>oai:r2:klik001</identifier>
<datestamp>2002-01-01</datestamp>
<metadataPrefix>oai_dc</metadataPrefix>
<harvestDate>2002-02-02T14:10:02Z</harvestDate>
</originDescription>
</provenance>
</about>
|
A set is an optional construct for grouping items for the purpose of selective harvesting. Repositories may organize items into sets. Set organization may be flat, i.e. a simple list, or hierarchical. Multiple hierarchies with distinct, independent top-level nodes are allowed. Hierarchical organization of sets is expressed in the syntax of the setSpec parameter as described below. When a repository defines a set organization it must include set membership information in the headers of items returned in response to the ListIdentifiers , ListRecords and GetRecord requests.
Each node in a set organization of a repository has:
The following is an example of a possible set hierarchy in a repository:
The following table shows a possible representation of the above set hierarchy by means of setNames and respective setSpec s.
| setName | setSpec |
| Institutions | institution |
|
Oceanside University of Nebraska |
institution:nebraska |
|
Valley View University of Florida |
institution:florida |
| Subjects | subject |
|
Existential Kenesiology |
subject:kenesiology |
|
Quantum Psychology |
subject:quantum |
An item may be organized in one set, several sets, or no sets at all. In the example above, it is conceivable that an individual item is organized in both subject and institution:florida. A harvester should not assume that harvesting every set in a repository will retrieve metadata from all items in the repository. Items may also be assigned to interior nodes in the set hierarchy.
The actual meaning of a set or of the arrangement of sets in a repository is not defined in the protocol. It is expected that individual communities may formulate well-defined set configurations with perhaps a controlled vocabulary for setNames and setSpec , and may even develop mechanisms for exposing these to harvesters. For example, a group of cooperating e-print archives in a specific discipline may agree on sets that arrange metadata in their repositories based on a controlled subject classification.
A repository's set hierarchy is represented in the protocol via setSpecs. ListSets returns a list indicating the configuration of sets in a repository. Each member of this list must include a setSpec and a setName and may include a setDescription. ListRecords and ListIdentifiers requests may include an optional set argument, the value of which is a setSpec, to specify the target set for selective harvesting. In the previous example of a set hierarchy, the setSpec institution:nebraska could be used in a request to return only those records that are disseminated from items organized in the set represented by this setSpec. Four issues should be noted here:
Selective harvesting allows harvesters to limit harvest requests to portions of the metadata available from a repository. The OAI-PMH supports selective harvesting with two types of harvesting criteria that may be combined in an OAI-PMH request: datestamps and set membership.
Harvesters may use datestamps to harvest only those records that were created, deleted, or modified within a specified date range. To specify datestamp-based selective harvesting, datestamps are included as values of the optional arguments, from and until, in the ListRecords and ListIdentifiers requests. Harvesting is restricted to the range specified by the from and until arguments, extending back to the earliest datestamp if from is omitted, and forward to the most recent datestamp if until is omitted. Range limits are inclusive: from specifies a bound that must be interpreted as "greater than or equal to", until specifies a bound that must be interpreted as "less than or equal to". Therefore, the from arugment must be less than or equal to the until argument. Otherwise, a repository must issue a badArgument error .
Repositories must support selective harvesting with the from and until arguments expressed at day granularity. Optional support
for seconds granularity is indicated in the response to the Identify
request. The value of datestamps in both requests and responses
must comply to the specifications for UTCdatetime in this document. A repository must update
the datestamp of a record if a change occurs, the result of which would be a
change to the metadata part of the XML-encoding
of the record. Such changes include, but are not limited to, changes to the
metadata of the record, changes to the metadata format of the record, introduction
of a new metadata format, termination of support for a metadata format, etc.
Datestamp ranges for selective harvesting are expressed in the from and until arguments that may be submitted in the ListRecords and ListIdentifiers requests. Repositories must use the following rules to create a ListRecords response matching the specified datestamp range according to the type of change that occured within the repository. The response to a ListIdentifiers request follows the same rules but is abbreviated to include only headers rather than records.
Every header returned by the GetRecord, ListRecords or ListIdentifiers requests contains a datestamp, which reflects the most recent date and time of the creation, modification, or deletion according to the rules defined above.
Harvesters may specify set membership as a criteria for selective harvesting. To specify set-based selective harvesting, a setSpec is included as the value of the optional set argument to the ListRecords and ListIdentifiers requests, thereby specifying selective harvesting of records from items within the respective set.
When a setSpec is used as an argument, the response must include:
In addition to the base URL, all requests consist of a list of keyword arguments, which take the form of key=value pairs. Arguments may appear in any order and multiple arguments must be separated by ampersands [ &]. Each OAI-PMH request must have at least one key=value pair that specifies the OAI-PMH request issued by the harvester:
The number and nature of additional key=value pairs depends on the arguments for the individual request.
However, since special characters in URIs must be encoded , the correct form of the above GET request URL is:
http://an.oa.org/OAI-script?
verb=GetRecord&identifier=oai%3AarXiv%3Ahep-th%2F9901001&metadataPrefix=oai_dc
Keyword arguments are carried in the message body of the HTTP POST. The Content-Type of the request must be application/x-www-form-urlencoded. For example, submitting the same request as above using the POST method would use just the base URL as the URL, with the format of the POST being:
POST
http://an.oa.org/OAI-script HTTP/1.0
Content-Length: 78
Content-Type:
application/x-www-form-urlencoded
verb=GetRecord&identifier=oai %3AarXiv %3Ahep-th%2F9901001&metadataPrefix=oai_dc
The syntax rules for URIs restrict a few characters to special roles in certain contexts, and require that if these characters are used in any other way that they must be written as an escape sequence, i.e. a percent sign followed by the character code in hexadecimal. The reserved characters include:
|
Character |
URI Role |
Escape Sequence |
|
/ |
Path Component Separator |
%2F |
|
? |
Query Component Separator |
%3F |
|
# |
Fragment Identifier |
%23 |
|
= |
Name/Value Separator |
%3D |
|
& |
Argument Separator in Query Component |
%26 |
|
: |
Host Port Separator |
%3A |
|
; |
Authority Namespace Separator |
%3B |
|
|
Space Character |
%20 |
|
% |
Escape Indicator |
%25 |
|
+ |
Escaped Space |
%2B |
As a result, these characters must be represented by their respective escape sequence if their use does not correspond to their established URI role. In case of the OAI-PMH, this means that the reserved characters must be encoded when they appear in the value part of the key=value pairs of the request. This applies for both the GET and POST encoding of the OAI-PMH requests.
The Content-Type returned for all OAI-PMH requests must be text/xml.
Response compression is optional in OAI-PMH. Compression of responses to OAI-PMH requests is handled at the level of HTTP, with the following restrictions:
All responses to OAI-PMH requests must be well-formed XML instance documents. Encoding of the XML must use the UTF-8 representation of Unicode. Character references, rather than entity references, must be used. Character references allow XML responses to be treated as stand-alone documents that can be manipulated without dependency on entity declarations external to the document.
The XML data for all responses to OAI-PMH requests must validate against the XML Schema shown at the end of this section . As can be seen from that schema, responses to OAI-PMH requests have the following common markup:
An example of a successful reply to the GetRecord request shown above is of the form:
<?xml version="1.0" encoding="UTF-8" ?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2002-05-01T19:20:30Z</responseDate>
<request verb="GetRecord" identifier="oai:arXiv:hep-th/9901001"
metadataPrefix="oai_dc">http://an.oa.org/OAI-script</request>
<GetRecord>
<record>...</record>
</GetRecord>
</OAI-PMH>
|
<schema targetNamespace="http://www.openarchives.org/OAI/2.0/"
xmlns:oai="http://www.openarchives.org/OAI/2.0/"
xmlns="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<annotation>
<documentation>
XML Schema which can be used to validate replies
to all OAI-PMH v.2.0 requests.
Herbert Van de Sompel. May 13th 2002.
Validated with XML Spy v.4.3 on May 13th 2002.
Validated with XSV 1.203.2.45/1.106.2.22
of 2002/01/11 16:40:28 on May 13th 2002.
</documentation>
</annotation>
<element name="OAI-PMH" type="oai:OAI-PMHtype"/>
<complexType name="OAI-PMHtype">
<sequence>
<element name="responseDate" type="dateTime"/>
<element name="request" type="oai:requestType"/>
<choice>
<element name="error" type="oai:OAI-PMHerrorType"
maxOccurs="unbounded"/>
<element name="Identify" type="oai:IdentifyType"/>
<element name="ListMetadataFormats"
type="oai:ListMetadataFormatsType"/>
<element name="ListSets" type="oai:ListSetsType"/>
<element name="GetRecord" type="oai:GetRecordType"/>
<element name="ListIdentifiers" type="oai:ListIdentifiersType"/>
<element name="ListRecords" type="oai:ListRecordsType"/>
</choice>
</sequence>
</complexType>
<!-- define requestType,
indicating the protocol request that led to the response -->
<!-- element content is BASE-URL,
attributes are arguments of protocol request,
attribute-values are values of arguments of protocol request -->
<!-- ============================================================= -->
<complexType name="requestType">
<simpleContent>
<extension base="anyURI">
<attribute name="verb" type="oai:verbType" use="optional"/>
<attribute name="identifier" type="oai:identifierType"
use="optional"/>
<attribute name="metadataPrefix" type="oai:metadataPrefixType"
use="optional"/>
<attribute name="from" type="oai:UTCdatetimeType"
use="optional"/>
<attribute name="until" type="oai:UTCdatetimeType"
use="optional"/>
<attribute name="set" type="oai:setSpecType" use="optional"/>
<attribute name="resumptionToken" type="string"
use="optional"/>
</extension>
</simpleContent>
</complexType>
<simpleType name="verbType">
<restriction base="string">
<enumeration value="Identify"/>
<enumeration value="ListMetadataFormats"/>
<enumeration value="ListSets"/>
<enumeration value="GetRecord"/>
<enumeration value="ListIdentifiers"/>
<enumeration value="ListRecords"/>
</restriction>
</simpleType>
<!-- define OAI-PMH error conditions -->
<!-- =============================== -->
<complexType name="OAI-PMHerrorType">
<simpleContent>
<extension base="string">
<attribute name="code" type="oai:OAI-PMHerrorcodeType"
use="required"/>
</extension>
</simpleContent>
</complexType>
<simpleType name="OAI-PMHerrorcodeType">
<restriction base="string">
<enumeration value="cannotDisseminateFormat"/>
<enumeration value="idDoesNotExist"/>
<enumeration value="badArgument"/>
<enumeration value="badVerb"/>
<enumeration value="noMetadataFormats"/>
<enumeration value="noRecordsMatch"/>
<enumeration value="badResumptionToken"/>
<enumeration value="noSetHierarchy"/>
</restriction>
</simpleType>
<!-- define OAI-PMH verb containers -->
<!-- ============================== -->
<!-- define Identify container -->
<complexType name="IdentifyType">
<sequence>
<element name="repositoryName" type="string"/>
<element name="baseURL" type="anyURI"/>
<element name="protocolVersion">
<simpleType>
<restriction base="string">
<enumeration value="2.0"/>
</restriction>
</simpleType>
</element>
<element name="adminEmail" type="oai:emailType"
maxOccurs="unbounded"/>
<element name="earliestDatestamp" type="oai:UTCdatetimeType"/>
<element name="deletedRecord" type="oai:deletedRecordType"/>
<element name="granularity" type="oai:granularityType"/>
<element name="compression" type="string"
minOccurs="0" maxOccurs="unbounded"/>
<element name="description" type="oai:descriptionType"
minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
<!-- define ListMetadataFormats container -->
<complexType name="ListMetadataFormatsType">
<sequence>
<element name="metadataFormat" type="oai:metadataFormatType"
maxOccurs="unbounded"/>
</sequence>
</complexType>
<!-- define ListSets container -->
<complexType name="ListSetsType">
<sequence>
<element name="set" type="oai:setType" maxOccurs="unbounded"/>
<element name="resumptionToken" type="oai:resumptionTokenType"
minOccurs="0"/>
</sequence>
</complexType>
<!-- define GetRecord container -->
<complexType name="GetRecordType">
<sequence>
<element name="record" type="oai:recordType"/>
</sequence>
</complexType>
<!-- define ListRecords container -->
<complexType name="ListRecordsType">
<sequence>
<element name="record" type="oai:recordType"
maxOccurs="unbounded"/>
<element name="resumptionToken" type="oai:resumptionTokenType"
minOccurs="0"/>
</sequence>
</complexType>
<!-- define ListIdentifiers container -->
<complexType name="ListIdentifiersType">
<sequence>
<element name="header" type="oai:headerType"
maxOccurs="unbounded"/>
<element name="resumptionToken" type="oai:resumptionTokenType"
minOccurs="0"/>
</sequence>
</complexType>
<!-- define basic types used in replies to
GetRecord, ListRecords, ListIdentifiers -->
<!-- ======================================= -->
<!-- define recordType -->
<!-- a record has a header, a metadata part, and
an optional about container -->
<complexType name="recordType">
<sequence>
<element name="header" type="oai:headerType"/>
<element name="metadata" type="oai:metadataType" minOccurs="0"/>
<element name="about" type="oai:aboutType"
minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
<!-- define headerType -->
<!-- a header has a unique identifier, a datestamp,
and setSpec(s) in case the item from which
the record is disseminated belongs to set(s).
the header can carry a deleted status indicatating
that the record is deleted. -->
<complexType name="headerType">
<sequence>
<element name="identifier" type="oai:identifierType"/>
<element name="datestamp" type="oai:UTCdatetimeType"/>
<element name="setSpec" type="oai:setSpecType"
minOccurs="0" maxOccurs="unbounded"/>
</sequence>
<attribute name="status" type="oai:statusType" use="optional"/>
</complexType>
<!-- define identifierType -->
<simpleType name="identifierType">
<restriction base="anyURI"/>
</simpleType>
<simpleType name="statusType">
<restriction base="string">
<enumeration value="deleted"/>
</restriction>
</simpleType>
<!-- define metadataType -->
<!-- metadata must be expressed in XML that complies
with another XML Schema -->
<!-- metadata must be explicitly qualified in the response -->
<complexType name="metadataType">
<sequence>
<any namespace="##other" processContents="strict"/>
</sequence>
</complexType>
<!-- define aboutType -->
<!-- data "about" the record must be expressed in XML -->
<!-- that is compliant with an XML Schema defined by a community -->
<complexType name="aboutType">
<sequence>
<any namespace="##other" processContents="strict"/>
</sequence>
</complexType>
<!-- define resumptionToken - with 3 optional attributes
can be used in ListSets, ListIdentifiers, ListRecords -->
<complexType name="resumptionTokenType">
<simpleContent>
<extension base="string">
<attribute name="expirationDate" type="dateTime"
use="optional"/>
<attribute name="completeListSize" type="positiveInteger"
use="optional"/>
<attribute name="cursor" type="nonNegativeInteger"
use="optional"/>
</extension>
</simpleContent>
</complexType>
<!-- define descriptionType used for description-element in Identify
and for setDescription element in ListSets-->
<!-- content must be compliant with an XML Schema
defined by a community -->
<complexType name="descriptionType">
<sequence>
<any namespace="##other" processContents="strict"/>
</sequence>
</complexType>
<!-- define UTCdatetime -->
<!-- datestamps are day or seconds granularity -->
<!-- ======================================== -->
<simpleType name="UTCdatetimeType">
<union memberTypes="date dateTime"/>
</simpleType>
<!-- define stuff used for Identify verb only -->
<!-- ======================================== -->
<simpleType name="emailType">
<restriction base="string">
<pattern value="\S+@(\S+\.)+\S+"/>
</restriction>
</simpleType>
<simpleType name="deletedRecordType">
<restriction base="string">
<enumeration value="no"/>
<enumeration value="persistent"/>
<enumeration value="transient"/>
</restriction>
</simpleType>
<simpleType name="granularityType">
<restriction base="string">
<enumeration value="YYYY-MM-DD"/>
<enumeration value="YYYY-MM-DDThh:mm:ssZ"/>
</restriction>
</simpleType>
<!-- define stuff used for ListMetadataFormats verb only -->
<!-- =================================================== -->
<complexType name="metadataFormatType">
<sequence>
<element name="metadataPrefix" type="oai:metadataPrefixType"/>
<element name="schema" type="anyURI"/>
<element name="metadataNamespace" type="anyURI"/>
</sequence>
</complexType>
<simpleType name="metadataPrefixType">
<restriction base="string">
<pattern value="[A-Za-z0-9_!'$\(\)\+\-\.\*]+"/>
</restriction>
</simpleType>
<!-- define stuff used for ListSets verb -->
<!-- =================================== -->
<complexType name="setType">
<sequence>
<element name="setSpec" type="oai:setSpecType"/>
<element name="setName" type="string"/>
<element name="setDescription" type="oai:descriptionType"
minOccurs="0" maxOccurs="unbounded"/>
</sequence>
</complexType>
<!-- define setSpecType -->
<simpleType name="setSpecType">
<restriction base="string">
<pattern value=
"([A-Za-z0-9_!'$\(\)\+\-\.\*])+(:[A-Za-z0-9_!'$\(\)\+\-\.\*]+)*"/>
</restriction>
</simpleType>
</schema>
|
| This Schema is available at http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd |
Dates and times are uniformly encoded using ISO8601 and are expressed in UTC throughout the protocol. When time is included, the special UTC designator ("Z") must be used. UTC is implied for dates although no timezone designator is specified. For example, 1957-03-20T20:30:00Z is UTC 8:30:00 PM on March 20th 1957. UTCdatetime is used in both protocol requests and protocol replies, in the way described in the following sections.
Datestamps used as values of the optional arguments from and until in the ListIdentifiers and ListRecords requests are encoded using ISO8601 and are expressed in UTC. These arguments are used to specify datestamp-based selective harvesting. These arguments support the "Complete date" and the "Complete date plus hours, minutes and seconds" granularities defined in ISO8601. The legitimate formats are YYYY-MM-DD and YYYY-MM-DDThh:mm:ssZ. Both arguments must have the same granularity. All repositories must support YYYY-MM-DD. A repository that supports YYYY-MM-DDThh:mm:ssZ should indicate so in the Identify response. A request by a harvester with finer granularity than that supported by a repository must produce an error .
Datestamps appear in the headers of records that are returned in response to ListIdentifiers , GetRecord and ListRecords requests. These datestamps are encoded using ISO8601 and are expressed in UTC; they must be expressed in the finest granularity supported by the repository. The value of the datestamp must correspond to the rules for datestamp-based selective harvesting.
Each protocol response includes a responseDate element, which must be the time and date of the response in UTC. This is encoded using the "Complete date plus hours, minutes, and seconds" variant of ISO8601 . This format is YYYY-MM-DDThh:mm:ssZ.
A resumptionToken in a protocol reply may include an optional argument expirationDate , which is expressed in UTC. This is encoded using the "Complete date plus hours, minutes, and seconds" variant of ISO8601 . This format is YYYY-MM-DDThh:mm:ssZ.
OAI-PMH supports the dissemination of records in multiple metadata formats from a repository. The ListMetadataFormats request returns the list of all metadata formats available from a repository, each of which has the following properities:
The metadata in each record returned by ListRecords and GetRecord must comply with the conventions of the XML namespace specification . This means that the root element of the metadata part must contain an xmlns attribute, the value of which is the XML namespace URI of the metadata format. The root element must also contain an xsi:schemaLocation attribute that has a value that includes the URL of the XML schema for validation of the metadata. This URL must match the URL of the metadata schema for the metadataPrefix included as an argument to the ListRecords or GetRecord request (the mapping from metadataPrefix to metadata schema is defined by the repository's response to the ListMetadataFormats request).
For purposes of interoperability, repositories must disseminate Dublin Core, without any qualification . Therefore, the protocol reserves the metadataPrefix `oai_dc', and the URL of a metadata schema for unqualified Dublin Core, which is http://www.openarchives.org/OAI/2.0/oai_dc.xsd . The corresponding XML namespace URI is http://www.openarchives.org/OAI/2.0/oai_dc/ .
The metadataPrefix `all' is reserved for future use. Implementations should not use this metadataPrefix.
Communities should adopt guidelines for sharing of metadataPrefixes,metadata schema and XML namespace URI's of metadata formats. Such guidelines are outside of the scope of the OAI-PMH. The accompanying Implementation Guidelines document provides some sample XML Schema and instance documents for common metadata formats such as MARC and RFC 1807 .
A number of OAI-PMH requests return a list of discrete entities: ListRecords returns a list of records, ListIdentifiers returns a list of headers , and ListSets returns a list of sets. Collectively these requests are called list requests. In some cases, these lists may be large and it may be practical to partition them among a series of requests and responses. This partitioning is accomplished as follows:
Details of flow control and the resumptionToken are as follows:
The following optional attributes may be included as part of the resumptionToken element along with the resumptionToken itself:
The following example is a series of ListRecords requests where the complete list consists of 175 records and the repository only returns 100 records per response.
This flow control mechanism, in combination with HTTP transport layer facilities, provides some basic tools with which a repository can enforce an acceptable use policy for its harvesting interface. Communities implementing the OAI-PMH may need more extensive tools to enforce acceptable use policies for either the harvesting interface of their repositories or for the metadata harvested from those repositories. The enforcement of such additional policies is outside of the scope of the OAI-PMH.
Repositories that implement resumptionTokens must do so in a manner that allows harvesters to resume a sequence of requests for incomplete lists by re-issuing a list request with the most recent resumptionToken. The purpose of this is to allow harvesters to recover from network or other errors that would otherwise mean that the list request sequence would have to be started again. A re-issue of a list request with a resumptionToken occurs in two contexts:
In event of an error or exception condition, repositories must indicate OAI-PMH errors, distinguished from HTTP Status-Codes, by including one or more error elements in the response. While one error element is sufficient to indicate the presence of the error or exception condition, repositories should report all errors or exceptions that arise from processing the request. Each error element must have a code attribute that must be from the following table; each error element may also have a free text string value to provide information about the error that is useful to a human reader. These strings are not defined by the OAI-PMH.
| Error Codes | Description | Applicable Verbs |
| badArgument | The request includes illegal arguments, is missing required arguments, includes a repeated argument, or values for arguments have an illegal syntax. | all verbs |
| badResumptionToken | The value of the resumptionToken argument is invalid or expired. | ListIdentifers ListRecords ListSets |
| badVerb |
Value of the verb argument is not a legal OAI-PMH verb, the verb argument is missing, or the verb argument is repeated. | N/A |
| cannotDisseminateFormat |
The metadata format identified by the value given for the metadataPrefix argument is not supported by the item or by the repository. | GetRecord ListIdentifiers ListRecords |
| idDoesNotExist |
The value of the identifier argument is unknown or illegal in this repository. | GetRecord ListMetadataFormats |
| noRecordsMatch |
The combination of the values of the from, until, set and metadataPrefix arguments results in an empty list. | ListIdentifiers ListRecords |
| noMetadataFormats | There are no metadata formats available for the specified item. | ListMetadataFormats |
| noSetHierarchy |
The repository does not support sets. | ListSets |
The following example demonstrates error handling in the case of an illegal verb argument. All request URLs shown from now on will be wrapped to make them more readable.
http://arXiv.org/oai2?
verb=nastyVerb
Response
|
<?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2002-05-01T09:18:29Z</responseDate> <request>http://arXiv.org/oai2</request> <error code="badVerb">Illegal OAI verb</error> </OAI-PMH> |
The following example demonstrates error handling in the case of a ListSets request to a repository that does not handle sets.
http://arXiv.org/oai2?
verb=ListSets
|
<?xml version="1.0" encoding="UTF-8"?> <OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2002-05-01T09:18:29Z</responseDate> <request verb="ListSets">http://arXiv.org/oai2</request> <error code="noSetHierarchy">This repository does not support sets</error> </OAI-PMH> |
An XML Schema defines the format of valid replies to all OAI-PMH requests.
This verb is used to retrieve an individual metadata record from a repository. Required arguments specify the identifier of the item from which the record is requested and the format of the metadata that should be included in the record. Depending on the level at which a repository tracks deletions, a header with a "deleted" value for the status attribute may be returned, in case the metadata format specified by the metadataPrefix is no longer available from the repository or from the specified item.
http://arXiv.org/oai2?
verb=GetRecord&identifier=oai:arXiv:cs/0112017&metadataPrefix=oai_dc
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2002-02-08T08:55:46Z</responseDate>
<request verb="GetRecord" identifier="oai:arXiv:cs/0112017"
metadataPrefix="oai_dc">http://arXiv.org/oai2</request>
<GetRecord>
<record>
<header>
<identifier>oai:arXiv:cs/0112017</identifier>
<datestamp>2001-12-14</datestamp>
<setSpec>cs</setSpec>
<setSpec>math</setSpec>
</header>
<metadata>
<oai_dc:dc
xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/
http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Using Structural Metadata to Localize Experience of
Digital Content</dc:title>
<dc:creator>Dushay, Naomi</dc:creator>
<dc:subject>Digital Libraries</dc:subject>
<dc:description>With the increasing technical sophistication of
both information consumers and providers, there is
increasing demand for more meaningful experiences of digital
information. We present a framework that separates digital
object experience, or rendering, from digital object storage
and manipulation, so the rendering can be tailored to
particular communities of users.
</dc:description>
<dc:description>Comment: 23 pages including 2 appendices,
8 figures</dc:description>
<dc:date>2001-12-14</dc:date>
</oai_dc:dc>
</metadata>
</record>
</GetRecord>
</OAI-PMH>
|
http://arXiv.org/oai2?
verb=GetRecord&identifier=oai:arXiv:quant-ph/02131001&metadataPrefix=oai_dc
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2002-02-08T08:55:46Z</responseDate>
<request verb="GetRecord" identifier="oai:arXiv:quant-ph/0213001"
metadataPrefix="oai_dc">http://arXiv.org/oai2</request>
<error code="idDoesNotExist">No matching identifier in arXiv</error>
</OAI-PMH>
|
http://arXiv.org/oai2?
verb=GetRecord&identifier=oai:arXiv:quant-ph/9901001&metadataPrefix=oai_marc
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2002-02-08T08:55:46Z</responseDate>
<request verb="GetRecord" identifier="oai:arXiv:quant-ph/9901001"
metadataPrefix="oai_marc">http://arXiv.org/oai1</request>
<error code="cannotDisseminateFormat"/>
</OAI-PMH>
|
This verb is used to retrieve information about a repository. Some of the information returned is required as part of the OAI-PMH. Repositories may also employ the Identify verb to return additional descriptive information.
None
The response must include one instance of the following elements:
The response must include one or more instances of the following element:
The response may include multiple instances of the following optional elements:
http://memory.loc.gov/cgi-bin/oai?
verb=Identify
<?xml version="1.0" encoding="UTF-8"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/
http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
<responseDate>2002-02-08T12:00:01Z</responseDate>
<request verb="Identify">http://memory.loc.gov/cgi-bin/oai</request>
<Identify>
<repositoryName>Library of Congress Open Archive Initiative
Repository 1</repositoryName>
<baseURL>http://memory.loc.gov/cgi-bin/oai</baseURL>
<protocolVersion>2.0</protocolVersion>
<adminEmail>somebody@loc.gov</adminEmail>
<adminEmail>anybody@loc.gov</adminEmail>
<earliestDatestamp>1990-02-01T12:00:00Z</earliestDatestamp>
<deletedRecord>transient</deletedRecord>
<granularity>YYYY-MM-DDThh:mm:ssZ</granularity>
<compression>deflate</compression>
<description>
<oai-identifier
xmlns="http://www.openarchives.org/OAI/2.0/oai-identifier"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation=
"http://www.openarchives.org/OAI/2.0/oai-identifier
http://www.openarchives.org/OAI/2.0/oai-identifier.xsd">
<scheme>oai</scheme>
<repositoryIdentifier>lcoa1</repositoryIdentifier>
<delimiter>:</delimiter>
<sampleIdentifier>oai:lcoa1:loc.music/musdi.002</sampleIdentifier>
</oai-identifier>
</description>
<description>
<eprints
xmlns="http://www.openarchives.org/OAI/1.1/eprints"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/1.1/eprints
http://www.openarchives.org/OAI/1.1/eprints.xsd">
<content>
<URL>http://memory.loc.gov/ammem/oamh/lcoa1_content.html</URL>
<text>Selected collections from American Memory at the Library
of Congress</text>
</content>
<metadataPolicy/>
<dataPolicy/>
</eprints>
</description>
<description>
<friends
xmlns="http://www.openarchives.org/OAI/2.0/friends/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/friends/
http://www.openarchives.org/OAI/2.0/friends.xsd">
<baseURL>http://oai.east.org/foo/</baseURL>
<baseURL>http://oai.hq.org/bar/</baseURL>
<baseURL>http://oai.south.org/repo.cgi</baseURL>
</friends>
</description>
</Identify>
</OAI-PMH>
|
This verb is an abbreviated form of ListRecords, retrieving only headers rather than records. Optional arguments permit selective harvesting of headers based on set membership and/or datestamp. Depending on the repository's support for deletions, a returned header may have a status attribute of "deleted" if a record matching the arguments specified in the request has been deleted.