Editors
The OAI Executive:
Carl Lagoze <lagoze@cs.cornell.edu>
-- Cornell University - Computer Science
Herbert Van de Sompel <herbertv@lanl.gov>
-- Los Alamos National Laboratory - Research
Library
From the OAI Technical Committee:
Michael Nelson
<m.l.nelson@larc.nasa.gov>
-- NASA - Langley Research Center
Simeon Warner
<simeon@cs.cornell.edu>
-- Cornell University - Computer Science
This document is one part of the Implementation Guidelines that accompany the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH).
The OAI identifier format is intended to provide persistent resource identifiers for items in repositories that implement OAI-PMH. This is just one possible format that may be used for identifiers within OAI-PMH.
oai-identifiers are Uniform Resource Names (URNs) in the sense
of RFC1737; they are resource
identifiers and not resource locators (URLs). Note that here the resource
is the metadata (the items) and not the underlying object or "stuff" that the
metadata describes. Correspondence between an oai-identifier and
any identifier that the object described by the metadata may have is outside
the scope of this specification and of the OAI-PMH. Adherence
to standards and accord with existing schemes is discussed at the end of
this document.
The oai-identifier syntax is a restriction of the
"general, absolute URI" syntax:
<scheme>:<scheme-specific-part>,
defined in
RFC 2396.
The following description uses the same notational conventions as
RFC 2396,
and the same definitions of
digit, alpha, alphanum,
reserved, unreserved and uric.
oai-identifier = scheme ":" namespace-identifier ":" local-identifier scheme = "oai" namespace-identifier = domainname-word "." domainname domainname = domainname-word [ "." domainname ] domainname-word = alpha *( alphanum | "-" ) local-identifier = 1*uric
Any uric elements are permitted in the local-identifier.
Since characters in the reserved set do not have any
special meaning in the local-identifier component, they
are permitted unescaped. All characters not included
in the unreserved and reserved sets must
be escaped (using the same
encoding
as OAI-PMH requests).
Characters in the unreserved and reserved sets
must not be escaped.
An oai-identifier should never be unescaped, the sole
purpose of permitting escaped characters is to allow
repositories to map any internal identifier to the
local-identifier part of an oai-identifier.
The following definitions are copied from
RFC 2396
for convenience:
uric = reserved | unreserved | escaped
reserved = ";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
unreserved = alphanum | mark
mark = "-" | "_" | "." | "!" | "~" | "*" | "'" | "(" | ")"
To avoid the possibility of inconsistently generated escaped
characters in an oai-identifier, the hex
digits must use uppercase for the letters A though F.
This is a further restriction on RFC 2396. Thus, escaped and
hex are defined as follows:
escaped = "%" hex hex hex = digit | "A" | "B" | "C" | "D" | "E" | "F"
Organizations must choose namespace-identifier values
which correspond to a domain-name that they have registered, and are
committed to maintaining. Note that since the oai-identifier
is case-sensitive, a particular capitalization style must be selected and
used consistently. A single domain name should not be used with variant
capitalizations.
Domain name registration is used to avoid the need for any additional
registration service for oai-identifiers. Domain name
based identifiers guarantee global uniqueness without the need for
OAI registration as required with the earlier, v1.0/1.1 specification.
Two oai-identifiers are equivalent if they are identical
strings. All three parts of the oai-identifier are case
sensitive. Any escaped elements must be left escaped;
there is no ambiguity because it is permissible (and required) only
to escape characters than cannot be included directly.
An oai-identifier scheme was introduced in
OAI-PMH v1.0
and remained unchanged in
OAI-PMH v1.1.
This scheme has been widely adopted and existing identifiers may
continue to be used by referring to the old schema:
http://www.openarchives.org/OAI/1.1/oai-identifier.xsd.
To use this new oai-identifier scheme, repositories must
make the following changes:
Identify response to refer to the new schema.namespace-identifier
to replace the repository-identifier.
A single namespace-identifier may be used
for identifiers in multiple repositories operated by the same organization.
The same oai-identifier description block
would then be used in the responses to Identify requests for each repository.
Uniqueness of the namespace-identifier is guaranteed through
domain name registration and not through registration with the
OAI validation service,
as it was with v1.0/1.1.local-identifier components of any identifiers
exposed use the restricted character set (uric) of this specification.
This may mean that internal identifiers need to be escaped to create the
local-identifier component. The characters <space>
and # were used with the earlier oai-identifier scheme and
may no longer be used in the local-identifier component.When used as an argument in an OAI-PMH request, an oai-identifier
must be correctly encoded. This means that the colon (:)
separators and the percent (%) characters of escaped
characters in the local-identifier part must be
URL encoded.
For example, the oai-identifier
oai:an.oai.org:ab%3Ccd would be encoded as
identifier=oai%3Aan.oai.org%3Aab%253Ccd in an OAI-PMH request.
This means that characters in some internal identifier that an
oai-identifier is derived from may be URL encoded twice
-- once to make the oai-identifier, and a second time
to express the oai-identifier in a URL. The URL will be decoded
once to recover the oai-identifier.
The following are valid oai-identifier identifiers:
oai:arXiv.org:hep-th/9901001
oai:foo.org:some-local-id-53
oai:FOO.ORG:some-local-id-53 ;not the same as above,
;should not use foo.org _and_ FOO.ORG
oai:foo.org:some-local-id-54
oai:foo.org:Some-Local-Id-54 ;not the same as above, distinct identifier
oai:wibble.org:ab%20cd ;space in internal id correctly escaped
oai:wibble.org:ab?cd ;question mark should not be escaped
The following are not valid oai-identifier identifiers:
something:arXiv.org:hep-th/9901001 ;bad scheme oai:999:abc123 ;namespace-identifier must not start with digit oai:wibble:abc123 ;namespace-identifier must be domain name oai:wibble.org:ab cd ;space not permitted (must be escaped as %20) oai:wibble.org:ab#cd ;# not permitted oai:wibble.org:ab<cd ;< not permitted oai:wibble.org:ab%3ccd ;< must be escaped at %3C not %3c
description containerThe following XML schema
(oai-identifier.xsd)
defines the format of a description container in the
Identify response so that repositories may expose their compliance
with the the oai-identifier format.
The value of the repositoryIdentifier element
is the namespace-identifier, which is not bound to a single
repository. The element name was kept to maintain continuity with v1.0/1.1
of this specification.
|
<schema targetNamespace="http://www.openarchives.org/OAI/2.0/oai-identifier"
xmlns:oai-identifier="http://www.openarchives.org/OAI/2.0/oai-identifier"
xmlns="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<annotation>
<documentation>
Schema for description section of Identify reply of OAI-PMH v2.0.
For repositories that comply with the oai format for unique identifiers
for items records.
See: http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm
Validated with http://www.w3.org/2001/03/webdata/xsv on 16May2002
Simeon Warner $Date: 2002/06/21 20:14:34 $
</documentation>
</annotation>
<element name="oai-identifier" type="oai-identifier:oai-identifierType"/>
<complexType name="oai-identifierType">
<sequence>
<element name="scheme" minOccurs="1" maxOccurs="1"
type="string" fixed="oai"/>
<element name="repositoryIdentifier" minOccurs="1" maxOccurs="1"
type="oai-identifier:repositoryIdentifierType"/>
<element name="delimiter" minOccurs="1" maxOccurs="1"
type="string" fixed=":"/>
<element name="sampleIdentifier" minOccurs="1" maxOccurs="1"
type="oai-identifier:sampleIdentifierType"/>
</sequence>
</complexType>
<simpleType name="repositoryIdentifierType">
<restriction base="string">
<pattern value="[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+"/>
</restriction>
</simpleType>
<simpleType name="sampleIdentifierType">
<restriction base="string">
<pattern
value="oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+:[a-zA-Z0-9\-_\.!~\*'\(\);/\?:@&=\+$,%]+"/>
<!--meta ., \, ?, *, +, {, } (, ), [ or ] -->
</restriction>
</simpleType>
</schema>
|
| This Schema is available at http://www.openarchives.org/OAI/2.0/oai-identifier.xsd |
The following examples are excerpts from Identify responses which may contain
zero or more <description> containers.
<description>
<oai-identifier xmlns="http://www.openarchives.org/OAI/2.0/oai-identifier"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai-identifier
http://www.openarchives.org/OAI/2.0/oai-identifier.xsd">
<scheme>oai</scheme>
<repositoryIdentifier>bespa.org</repositoryIdentifier>
<delimiter>:</delimiter>
<sampleIdentifier>oai:bespa.org:medi99-123</sampleIdentifier>
</oai-identifier>
</description>
|
<description>
<oai-identifier xmlns="http://www.openarchives.org/OAI/2.0/oai-identifier"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai-identifier
http://www.openarchives.org/OAI/2.0/oai-identifier.xsd">
<scheme>oai</scheme>
<repositoryIdentifier>oai-stuff.foo.org</repositoryIdentifier>
<delimiter>:</delimiter>
<sampleIdentifier>oai:oai-stuff.foo.org:5324</sampleIdentifier>
</oai-identifier>
</description>
|
The following two sections describe how the oai-identifier
meets the requirements for URN schemes outlined in
RFC1737.
oai-identifiers should have global scope in the sense
that two equivalent oai-identifiers should have the same meaning
everywhere (i.e. they identify the same metadata item).oai-identifier should never be assigned to
different metadata items. To be useful for dedupping, the same metadata item
should not have more than one oai-identifier. Note that this does not imply
that there will not be more than one metadata item (and hence oai-identifier)
that describe the same underlying resource.oai-identifiers will be permanent.
That is, oai-identifiers must remain globally unique and items should
retain the same oai-identifier.
(This is considerably weaker than RFC1737.)oai-identifiers should not be
limited by the syntax. Separation into two parts:
a namespace-identifier and a local-identifier
assures scalability in the same way as other URI schemes.oai-identifiers does
not accommodate existing oai-identifiers created
for use with OAI-PMH versions 1.0 and 1.1. Repositories wishing
to use that scheme may still do so,
see "Backwards compatibility".oai-identifier scheme is designed
around a model of namespace-identifier and
local-identifier. While the syntax of
local-identifier is undefined and may be used for some
possible extensions, the rest of the syntax is not. A more complex
scheme could be supported by extension of the
namespace-identifier syntax or by the creation of a
new URI scheme (OAI-PMH allows arbitrary URIs as identifiers).
(This is considerably weaker than RFC1737.)oai-identifiers are intended to serve as
identifiers for metadata items within repositories. It is not intended
that oai-identifiers be used outside the context of a set
of interacting repositories and harvesters.
With knowledge of the repository that an oai-identifier
was obtained from, it will be possible to obtain the status of the
item and to disseminate metadata from it (provided the OAI-PMH
interface is operational).
No general resolution scheme is proposed or imagined. Any such scheme
would involve an additional registration database.
(This is considerably weaker than RFC1737.)oai-identifiers are not designed for human use, they are
designed to be used only with the OAI-PMH. As such, presentation in
text, electronic mail etc. is not important. This makes the encoding
requirements considerably simpler than those described in
RFC1737:
oai-identifier.oai-identifiers.oai-identifiers should be able to be
transported unmodified over common Internet protocols (e.g. HTTP) and using
common encoding standards (e.g. XML, RDF).oai-identifiers should be easy to parse.oai-identifiers should be short so that
transmitting them and managing them within computer programs is convenient.Support for the development of the OAI-PMH and for other Open Archives Initiative activities comes from the Digital Library Federation, the Coalition for Networked Information, and from the National Science Foundation through Grant No. IIS-9817416. Individuals who have played a significant role in the development of OAI-PMH version 2.0 are acknowledged in the protocol document.
2006-03-09: Added clarification that repositoryIdentifier is the
container for the namespace-identifier and is not bound to a particular
repository.
2002-06-21: Added type definitions to scheme and
delimiter elements in schema.
2002-06-14: Release of this document, combined with the release of OAI-PMH
version 2.0.
