OAI logo

Implementation Guidelines for the Open Archives Initiative Protocol for Metadata Harvesting

- Specification and XML Schema for the OAI Identifier Format

Protocol Version 2.0 of 2002-06-14
Document Version 2006/03/09T19:52:00Z
http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm

Editors

The OAI Executive:
Carl Lagoze <lagoze@cs.cornell.edu> -- Cornell University - Computer Science
Herbert Van de Sompel <herbertv@lanl.gov> -- Los Alamos National Laboratory - Research Library

From the OAI Technical Committee:
Michael Nelson <m.l.nelson@larc.nasa.gov> -- NASA - Langley Research Center
Simeon Warner <simeon@cs.cornell.edu> -- Cornell University - Computer Science

This document is one part of the Implementation Guidelines that accompany the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).

Specification and XML Schema for the OAI Identifier Format

1. Introduction

The OAI identifier format is intended to provide persistent resource identifiers for items in repositories that implement OAI-PMH. This is just one possible format that may be used for identifiers within OAI-PMH.

oai-identifiers are Uniform Resource Names (URNs) in the sense of RFC1737; they are resource identifiers and not resource locators (URLs). Note that here the resource is the metadata (the items) and not the underlying object or "stuff" that the metadata describes. Correspondence between an oai-identifier and any identifier that the object described by the metadata may have is outside the scope of this specification and of the OAI-PMH. Adherence to standards and accord with existing schemes is discussed at the end of this document.

2. Description

2.1 Syntax

The oai-identifier syntax is a restriction of the "general, absolute URI" syntax: <scheme>:<scheme-specific-part>, defined in RFC 2396. The following description uses the same notational conventions as RFC 2396, and the same definitions of digit, alpha, alphanum, reserved, unreserved and uric.

  oai-identifier = scheme ":" namespace-identifier ":" local-identifier

  scheme = "oai"

  namespace-identifier = domainname-word "." domainname
  domainname = domainname-word [ "." domainname ]
  domainname-word = alpha *( alphanum | "-" )

  local-identifier = 1*uric

Any uric elements are permitted in the local-identifier. Since characters in the reserved set do not have any special meaning in the local-identifier component, they are permitted unescaped. All characters not included in the unreserved and reserved sets must be escaped (using the same encoding as OAI-PMH requests). Characters in the unreserved and reserved sets must not be escaped. An oai-identifier should never be unescaped, the sole purpose of permitting escaped characters is to allow repositories to map any internal identifier to the local-identifier part of an oai-identifier. The following definitions are copied from RFC 2396 for convenience:

  uric        = reserved | unreserved | escaped
  reserved    = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
  unreserved  = alphanum | mark
  mark        = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"

To avoid the possibility of inconsistently generated escaped characters in an oai-identifier, the hex digits must use uppercase for the letters A though F. This is a further restriction on RFC 2396. Thus, escaped and hex are defined as follows:

  escaped     = "%" hex hex
  hex         = digit | "A" | "B" | "C" | "D" | "E" | "F"

2.2 Namespace Identifier

Organizations must choose namespace-identifier values which correspond to a domain-name that they have registered, and are committed to maintaining. Note that since the oai-identifier is case-sensitive, a particular capitalization style must be selected and used consistently. A single domain name should not be used with variant capitalizations.

Domain name registration is used to avoid the need for any additional registration service for oai-identifiers. Domain name based identifiers guarantee global uniqueness without the need for OAI registration as required with the earlier, v1.0/1.1 specification.

2.3 Equivalence

Two oai-identifiers are equivalent if they are identical strings. All three parts of the oai-identifier are case sensitive. Any escaped elements must be left escaped; there is no ambiguity because it is permissible (and required) only to escape characters than cannot be included directly.

2.4 Backwards Compatibility

An oai-identifier scheme was introduced in OAI-PMH v1.0 and remained unchanged in OAI-PMH v1.1. This scheme has been widely adopted and existing identifiers may continue to be used by referring to the old schema: http://www.openarchives.org/OAI/1.1/oai-identifier.xsd.

To use this new oai-identifier scheme, repositories must make the following changes:

2.5 Use as Arguments in OAI-PMH Requests

When used as an argument in an OAI-PMH request, an oai-identifier must be correctly encoded. This means that the colon (:) separators and the percent (%) characters of escaped characters in the local-identifier part must be URL encoded. For example, the oai-identifier oai:an.oai.org:ab%3Ccd would be encoded as identifier=oai%3Aan.oai.org%3Aab%253Ccd in an OAI-PMH request. This means that characters in some internal identifier that an oai-identifier is derived from may be URL encoded twice -- once to make the oai-identifier, and a second time to express the oai-identifier in a URL. The URL will be decoded once to recover the oai-identifier.

2.6 Examples

The following are valid oai-identifier identifiers:

oai:arXiv.org:hep-th/9901001

oai:foo.org:some-local-id-53
oai:FOO.ORG:some-local-id-53     ;not the same as above, 
                                 ;should not use foo.org _and_ FOO.ORG

oai:foo.org:some-local-id-54
oai:foo.org:Some-Local-Id-54     ;not the same as above, distinct identifier

oai:wibble.org:ab%20cd           ;space in internal id correctly escaped 
oai:wibble.org:ab?cd             ;question mark should not be escaped 

The following are not valid oai-identifier identifiers:

something:arXiv.org:hep-th/9901001   ;bad scheme

oai:999:abc123                   ;namespace-identifier must not start with digit
oai:wibble:abc123                ;namespace-identifier must be domain name

oai:wibble.org:ab cd             ;space not permitted (must be escaped as %20) 
oai:wibble.org:ab#cd             ;# not permitted
oai:wibble.org:ab<cd             ;< not permitted
oai:wibble.org:ab%3ccd           ;< must be escaped at %3C not %3c

3. XML Schema for description container

The following XML schema (oai-identifier.xsd) defines the format of a description container in the Identify response so that repositories may expose their compliance with the the oai-identifier format. The value of the repositoryIdentifier element is the namespace-identifier, which is not bound to a single repository. The element name was kept to maintain continuity with v1.0/1.1 of this specification.

description for repositories that share the OAI format for unique identifiers of records

<schema targetNamespace="http://www.openarchives.org/OAI/2.0/oai-identifier"
  xmlns:oai-identifier="http://www.openarchives.org/OAI/2.0/oai-identifier"
  xmlns="http://www.w3.org/2001/XMLSchema"
  elementFormDefault="qualified"
  attributeFormDefault="unqualified">

  <annotation>
    <documentation>
      Schema for description section of Identify reply of OAI-PMH v2.0.
      For repositories that comply with the oai format for unique identifiers 
      for items records. 
      See: http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm
      Validated with http://www.w3.org/2001/03/webdata/xsv on 16May2002 
      Simeon Warner $Date: 2002/06/21 20:14:34 $
    </documentation>
  </annotation>

  <element name="oai-identifier" type="oai-identifier:oai-identifierType"/>

  <complexType name="oai-identifierType">
    <sequence>
      <element name="scheme" minOccurs="1" maxOccurs="1" 
               type="string" fixed="oai"/>
      <element name="repositoryIdentifier" minOccurs="1" maxOccurs="1" 
               type="oai-identifier:repositoryIdentifierType"/>
      <element name="delimiter" minOccurs="1" maxOccurs="1"
               type="string" fixed=":"/>
      <element name="sampleIdentifier" minOccurs="1" maxOccurs="1" 
               type="oai-identifier:sampleIdentifierType"/>
    </sequence>
  </complexType>

  <simpleType name="repositoryIdentifierType">
    <restriction base="string">
      <pattern value="[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+"/>
    </restriction>
  </simpleType>

  <simpleType name="sampleIdentifierType">
    <restriction base="string">
      <pattern 
value="oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+:[a-zA-Z0-9\-_\.!~\*&apos;\(\);/\?:@&amp;=\+$,%]+"/>

<!--meta ., \, ?, *, +, {, } (, ), [ or ] -->
    </restriction>
  </simpleType>

</schema>
This Schema is available at http://www.openarchives.org/OAI/2.0/oai-identifier.xsd

3.1 Examples

The following examples are excerpts from Identify responses which may contain zero or more <description> containers.

<description>
  <oai-identifier xmlns="http://www.openarchives.org/OAI/2.0/oai-identifier"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai-identifier
      http://www.openarchives.org/OAI/2.0/oai-identifier.xsd">
    <scheme>oai</scheme> 
    <repositoryIdentifier>bespa.org</repositoryIdentifier>    
    <delimiter>:</delimiter> 
    <sampleIdentifier>oai:bespa.org:medi99-123</sampleIdentifier>
  </oai-identifier>
</description>
<description>
  <oai-identifier xmlns="http://www.openarchives.org/OAI/2.0/oai-identifier" 
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
      xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai-identifier
      http://www.openarchives.org/OAI/2.0/oai-identifier.xsd">
    <scheme>oai</scheme>
    <repositoryIdentifier>oai-stuff.foo.org</repositoryIdentifier>
    <delimiter>:</delimiter>
    <sampleIdentifier>oai:oai-stuff.foo.org:5324</sampleIdentifier>
  </oai-identifier>
</description>

4. Adherence to standards and accord with existing schemes

The following two sections describe how the oai-identifier meets the requirements for URN schemes outlined in RFC1737.

4.1 Functional requirements

4.2 Encoding requirements

oai-identifiers are not designed for human use, they are designed to be used only with the OAI-PMH. As such, presentation in text, electronic mail etc. is not important. This makes the encoding requirements considerably simpler than those described in RFC1737:

Acknowledgements

Support for the development of the OAI-PMH and for other Open Archives Initiative activities comes from the Digital Library Federation, the Coalition for Networked Information, and from the National Science Foundation through Grant No. IIS-9817416. Individuals who have played a significant role in the development of OAI-PMH version 2.0 are acknowledged in the protocol document.

Document History

2006-03-09: Added clarification that repositoryIdentifier is the container for the namespace-identifier and is not bound to a particular repository.
2002-06-21: Added type definitions to scheme and delimiter elements in schema.
2002-06-14: Release of this document, combined with the release of OAI-PMH version 2.0.

Creative Commons License
This work is licensed under a Creative Commons License.