ORE User Guide - Primer

17 October 2008

This version:
http://www.openarchives.org/ore/1.0/primer
Latest version:
http://www.openarchives.org/ore/primer
Previous version:
http://www.openarchives.org/ore/0.9/primer
Editors (OAI Executive)
Carl Lagoze, Cornell University Information Science
Herbert Van de Sompel, Los Alamos National Laboratory
Editors (ORE Technical Committee)
Pete Johnston, Eduserv Foundation
Michael Nelson, Old Dominion University
Robert Sanderson, University of Liverpool
Simeon Warner, Cornell University Information Science

Abstract

Open Archives Initiative Object Reuse and Exchange (OAI-ORE) defines standards for the description and exchange of aggregations of Web resources. This document summarizes the motivation for OAI-ORE, the technical foundations on which the OAI-ORE solution to handle aggregations of Web resources builds, as well as the core aspects of that solution.This document is intended for a general audience that wants to obtain a high-level understanding of the OAI-ORE solution. Detailed information for more technically advanced readers and implementers is available in the OAI-ORE specifications and user guides.

Table of Contents

1. Why use OAI-ORE?
2. Motivating Example
3. Foundations of ORE
    3.1 Architecture of the World Wide Web
    3.2 Semantic Web, Linked Data, Cool URIs for the Semantic Web
    3.3 Resource Description Framework
4. ORE in a Nutshell
5. Resource Map Serialization
    5.1 Resource Map in RDF/XML
    5.2 Resource Map in Atom
6. Resource Maps and Aggregations on the Web
    6.1 HTTP 303
    6.2 Hash URIs
7. What should you read now?
8. References

Appendices

A. Acknowledgements
B. Change Log


1. Why use OAI-ORE?

In the physical world we create, use, and refer to aggregations of things all the time. We collect pictures in a photo album, read journals that are collections of articles, and burn CDs of our favorite songs. In this physical world these aggregations are frequently tangible - we can hold the photo album, journal, and CD. But, we also aggregate abstract entities - for example classification schemes aggregate abstract subjects into broader abstract groups.

This practice of aggregating extends to the Web. We accumulate URL's in bookmarks or favorites lists in our browser, collect photos into sets in popular sites like Flickr, browse over multiple page documents that are linked together through "prev" and "next" tags, and talk about Web sites as if they had some real existence beyond the set of pages of which they consist. Despite our frequent use of these aggregations, their existence on the Web is quite ephemeral. One reason for this is that there is no standard way to identify an aggregation. We often use the URI of one page of an aggregation to identify the whole aggregation. For example, we use the URI of the first page of a multi-page Web document to identify the whole document, or we use the URI of the HTML page that provides access to a Flickr set to identify the entire set of images. But those URIs really just identify those specific pages, and not the union of pages that makes up the whole document, or the union of all images in a Flickr set, respectively. In essence, the problem is that there is no standard way to describe the constituents or boundary of an aggregation, and this is what OAI-ORE aims to provide.

Because aggregations are not well-defined on the Web, we are limitited in what we can do with them, especially in terms of the services or automated processes that make the Web useful. People who wish to save or print a multiple page document must manually click through each page and invoke the appropriate browser command. Programs that transfer multiple page documents among information systems must rely on the API's of the individual system architectures and their definition of document boundaries. Search engines must use heuristics to group individual Web pages into logical documents so that search results have the proper granularity.

This primer describes the essence of the solution that OAI-ORE provides to deal with aggregations of Web resources; it is intended for a general audience that wants to obtain a high-level understanding of the OAI-ORE solution. The primer also provides pointers to OAI-ORE specifications and implementation guidelines, which provide detailed information for implementers. This primer is structured as follows:

2. Motivating Example

The aggregation problem that ORE addresses can be explained by means of a document in the arXiv, a well-known repository of physics, mathematics, and computer science research results. The human start page for this document is shown in Figure 1. Some aspects of the page relevant to the ORE aggregation problem are highlighted in red rectangles, each with a number. The meanings of the highlighted areas are as follows:

  1. The URI http://arxiv.org/abs/astro-ph/0601007 of the human start page.
  2. The formats in which the document is available, i.e. PostScript, PDF, etc. These are effectively the constituents of the aggregation that is the arXiv document. For the remainder of this example we will consider this human start page, the splash page, as also a constituent of the aggregation
  3. The title of the arXiv document.
  4. The authors of the arXiv document.
  5. The creation and last modification date of the arXiv document.
  6. Identifiers of entities that are in some manner comparable to this arXiv document. For example, a version of this document was later published as an article in a peer-reviewed journal, and the Digital Object Identifier of that article is shown.
  7. The versions of this document.
  8. Links to other arXiv documents in the same collection (i.e., astro-ph).
  9. Citations made by this arXiv document, and citations it received from other documents.

Figure 1: Human start page for an arXiv document

Figure of human start page

This rather simple example highlights the core issues that ORE addresses:

3. Foundations of ORE

ORE solves the aforementioned problems by introducing a URI for the aggregation that denotes the entire arXiv document, and by publishing a machine-readable document that describes that aggregation. For example, the document describes which resources are part of the aggregation, and which are merely related to it. This section briefly introduces the foundations upon which the ORE solution to the aggregation problem is built.

3.1. Architecture of the World Wide Web

The foundations of the Web as we know it are detailed in the Architecture of the World Wide Web [Web Architecture]. This architecture defines the following core notions:

3.2. Semantic Web, Linked Data, Cool URIs for the Semantic Web

On the Web that we use on a daily basis, URIs are used primarily to identify Web documents. They are identifiers that, when dereferenced, return a human-readable Representation. However, on the Semantic Web, URIs are introduced to identify so-called real world entities, such as people or cars, or even abstract entities, such ideas or classes. Since these things are not documents, they have no Representation to indicate what these Resources mean. The Linked Data Effort [Linked Data Tutorial] describes an approach for obtaining information about those Resources despite the fact that they have no Representation. To summarize, the approach consists of:

3.3 Resource Description Framework (RDF)

The documents that are proposed by the Linked Data effort to describe these abstract Resources are typically expressed in RDF/XML, which is an XML-based serialization for the Resource Description Framework (RDF) [RDF Concepts] that forms the foundational data model of the Semantic Web. This model consists of subject-predicate-object statements called triples. Triples express relationships pertaining to a subject Resource denoted by a URI. The predicate Resource, also denoted by a URI, indicates the nature of the relationship . The object expresses the actual value for the relationship expressed by the predicate; the object can be denoted by a URI or can be a literal value, such as a string or a number. When multiple triples are expressed, or asserted, they may share subjects and objects and, as a result they conceptually join together in what is called a graph in mathematical terms. This graph consists of nodes that are the Resources denoted by the subject and object URIs, and edges that are the relationship predicates.

An example is shown in Figure 2. The figure shows four RDF triples, which are then depicted in the graph in which Resources are yellow circles that list their URIs. Note that because R1 is the subject of two triples, it has two outgoing edges in the graph. Similarly since R2 is the object of two triples, it has two incoming edges. The illustration also shows a triple that has a literal string (e.g. "John Doe") as its object.

Figure 2: Four RDF triples and their graph representation

Figure of RDF triples and graph representation

4. ORE in a Nutshell

ORE leverages the foundations described above to arrive at a solution to handle aggregations of Web resources. The essence of the ORE solution can be summarized as follows (Figure 3):

Figure 3: The Aggregation A-1 aggregates three Resources and is described by Resource Map ReM-1

ORE basic model

5. Resource Map Serialization

ORE supports Resource Map serializations in RDF/XML, RDFa, and Atom XML. Below, examples are shown of RDF/XML and Atom XML Resource Maps that convey some essential information pertaining to the example arXiv document. Note that the URI http://arxiv.org/aggregation/astro-ph/0601007 was introduced as the HTTP URI to identify the Aggregation that denotes the arXiv document.

5.1 Resource Maps in RDF/XML

Figure 3 shows some of the core relationships introduced by the ORE Data Model that is entirely based on RDF. Because of that, a Resource Map that describes an Aggregation can readily be expressed in RDF/XML and other RDF serialization formats such as n3 and turtle. Table 1 shows a simple RDF/XML Resource Map that describes the arXiv Aggregation http://arxiv.org/aggregation/astro-ph/0601007. The comments in the RDF/XML document explain how the various RDF statements relate to the ORE concepts introduced in the section ORE in a Nutshell.

Table 1: A simple Resource Map for the arXiv Aggregation serialized in RDF/XML

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
    xmlns:ore="http://www.openarchives.org/ore/terms/"
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:foaf="http://xmlns.com/foaf/0.1/" >

    <!-- About the Aggregation for the ArXiv document -->

    <rdf:Description rdf:about="http://arxiv.org/aggregation/astro-ph/0601007">
        <!-- The Resource is an ORE Aggregation  -->
        <rdf:type rdf:resource="http://www.openarchives.org/ore/terms/Aggregation"/>
        <!-- The Aggregation aggregates ... -->
        <ore:aggregates rdf:resource="http://arxiv.org/abs/astro-ph/0601007"/>
        <ore:aggregates rdf:resource="http://arxiv.org/ps/astro-ph/0601007"/>
        <ore:aggregates rdf:resource="http://arxiv.org/pdf/astro-ph/0601007"/>
        <!-- Metadata about the Aggregation: title and authors -->
        <dc:title>Parametrization of K-essence and Its Kinetic Term</dc:title>
        <dcterms:creator rdf:parseType="Resource">
            <foaf:name>Hui Li</foaf:name>
            <foaf:mbox rdf:resource="mailto:lihui@somewhere.cn"/>
        </dcterms:creator>
        <dcterms:creator rdf:parseType="Resource">
            <foaf:name>Zong-Kuan Guo</foaf:name>
        </dcterms:creator>
        <dcterms:creator rdf:parseType="Resource">
            <foaf:name>Yuan-Zhong Zhang</foaf:name>
        </dcterms:creator>
    </rdf:Description>
    
    <!-- About the Resource Map (this RDF/XML document) that describes the Aggregation -->
    
    <rdf:Description rdf:about="http://arxiv.org/rem/atom/astro-ph/0601007">
        <!-- The Resource is an ORE Resource Map  -->
        <rdf:type rdf:resource="http://www.openarchives.org/ore/terms/ResourceMap"/>
        <!-- The Resource Map describes a specific Aggregation   -->
        <ore:describes rdf:resource="http://arxiv.org/aggregation/astro-ph/0601007"/>
        <!-- Metadata about the Resource Map: datetimes, rights, and author -->
        <dcterms:modified>2008-10-03T07:30:34Z</dcterms:modified>
        <dcterms:created>2008-10-01T18:30:02Z</dcterms:created>
        <dc:rights>This Resource Map is available under the Creative Commons Attribution-Noncommercial Generic license</dc:rights>
        <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by-nc/2.5/rdf"/>
        <dcterms:creator rdf:parseType="Resource">
            <foaf:page rdf:resource="http://arxiv.org"/>
            <foaf:name>arXiv.org e-Print Repository</foaf:name>
        </dcterms:creator>
    </rdf:Description>
    
    <!-- About the human start page that is part of the Aggregation -->
    
    <rdf:Description rdf:about="http://arxiv.org/abs/astro-ph/0601007">
        <dc:format>text/html</dc:format>
        <dc:title>[astro-ph/0601007] Parametrization of K-essence and Its Kinetic Term</dc:title>
        <rdf:type>info:eu-repo/semantics/humanStartPage</rdf:type>
    </rdf:Description>
    
    <!-- About the PostScript resource that is part of the Aggregation -->
    
    <rdf:Description rdf:about="http://arxiv.org/ps/astro-ph/0601007">
        <dc:format>application/postscript</dc:format>
        <dc:language>en</dc:language>
        <dc:title>Parametrization of K-essence and Its Kinetic Term</dc:title>
    </rdf:Description>
    
    <!-- About the PDF resource that is part of the Aggregation -->
    
    <rdf:Description rdf:about="http://arxiv.org/pdf/astro-ph/0601007">
        <dc:format>application/pdf</dc:format>
        <dc:language>en</dc:language>
        <dc:title>Parametrization of K-essence and Its Kinetic Term</dc:title>
    </rdf:Description>
</rdf:RDF>

5.2 Resource Maps in Atom XML

Atom is an XML-based format that was originally designed as a mechanism for syndicating feeds from news sources, blogs, and other dynamic Web sites. In that manner it is like the many versions of RSS. The design of Atom is more recent and includes modern XML features such as namespaces and, as a result has a flexible extensibility mechanism allowing elements and relationships from other namespaces. Because of this, in recent years the use of Atom has been extended to many purposes such as packaging descriptions of a variety of Web Resources.

This is not a primer for Atom and the interested reader is referred to RFC-4287 that fully describes Atom. The reader of this primer only needs to be aware of the basic entities in Atom, as shown in the Figure 4, including:

Figure 4: An Atom Feed with two Atom Entries

Atom Feed/Entry structure

An Atom document may either be one feed with multiple entries as shown in Figure 4, or may just be a single entry that is not contained within a feed. ORE leverages the latter, expressing a Resource Map as an Atom entry. Table 2 shows an Atom entry that describes the arXiv Aggregation http://arxiv.org/aggregation/astro-ph/0601007. The comments in the Atom XML document explain how the various Atom elements relate to the ORE concepts introduced in the section ORE in a Nutshell.

Table 2: A simple Resource Map for the arXiv Aggregation serialized in Atom XML

<?xml version="1.0" encoding="UTF-8"?>
<atom:entry 
    xmlns:dcterms="http://purl.org/dc/terms/" 
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:ore="http://www.openarchives.org/ore/terms/" 
    xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <atom:id>tag:arxiv.org,2008:astro-ph:0601007</atom:id>
    
    <!-- About the Aggregation for the ArXiv document -->
    
    <!-- This Atom entry describes a specific ORE Aggregation -->
    <atom:link href="http://arxiv.org/aggregation/astro-ph/0601007"
        rel="http://www.openarchives.org/ore/terms/describes"/>
    <atom:category term="http://www.openarchives.org/ore/terms/Aggregation"
        scheme="http://www.openarchives.org/ore/terms/" label="Aggregation"/> 
    <!-- The Aggregation aggregates ... -->
    <atom:link href="http://arxiv.org/abs/astro-ph/0601007"
        rel="http://www.openarchives.org/ore/terms/aggregates"
        title="[astro-ph/0601007] Parametrization of K-essence and Its Kinetic Term"
        type="text/html" />
    <atom:link href="http://arxiv.org/ps/astro-ph/0601007"
        rel="http://www.openarchives.org/ore/terms/aggregates"
        title="Parametrization of K-essence and Its Kinetic Term" type="application/postscript"
        hreflang="en"/>
    <atom:link href="http://arxiv.org/pdf/astro-ph/0601007"
        rel="http://www.openarchives.org/ore/terms/aggregates"
        title="Parametrization of K-essence and Its Kinetic Term" type="application/pdf"
        hreflang="en"/>
    <!-- Metadata about the Aggregation: title and authors -->
    <atom:title>Parametrization of K-essence and Its Kinetic Term</atom:title>
    <atom:author>
        <atom:name>Hui Li</atom:name>
        <atom:email>lihui@somewhere.cn</atom:email>
    </atom:author>
    <atom:author>
        <atom:name>Zong-Kuan Guo</atom:name>
    </atom:author>
    <atom:author>
        <atom:name>Yuan-Zhong Zhang</atom:name>
    </atom:author>
    
    <!-- About the Resource Map (this Atom XML entry document) that describes the Aggregation -->
    
    <!-- The HTTP URI of this Resouce Map -->
    <atom:link href="http://arxiv.org/rem/atom/astro-ph/0601007"
        rel="self"
        type="application/atom+xml"/>
    <!-- Metadata about the Resource Map: datetimes, rights, and author -->
    <atom:updated>2008-10-03T07:30:34Z</atom:updated>
    <atom:published>2008-10-01T18:30:02Z</atom:published>
    <atom:rights>This Resource Map is available under the Creative Commons  Attribution-Noncommercial Generic license</atom:rights>
    <atom:link href="http://creativecommons.org/licenses/by-nc/2.5/rdf"
        rel="license"
        type="application/rdf+xml"/>
    <atom:source>
        <atom:author>
            <atom:name>arXiv.org e-Print Repository</atom:name>
            <atom:uri>http://arxiv.org</atom:uri>
        </atom:author>
    </atom:source>   
    
    <!-- About the human start page that is part of the Aggregation -->
    
    <atom:link href="http://arxiv.org/abs/astro-ph/0601007" rel="alternate"/>

</atom:entry>

6. Resource Maps and Aggregations on the Web

When a Resource Map is published on the Web, its URI can be dereferenced by an HTTP protocol request that returns an RDF/XML or Atom XML document as shown in the section Resource Map Serializations. Clients and agents can then interpret that document and provide enhanced services based on the included information. These include navigation, printing, archiving, visualizing, and transforming the Aggregation.

The reverse functionality is also important. Clients that get access to the HTTP URI of an Aggregation, via a citation or another form of linking, should be able to discover that the Resource identified by that HTTP URI is indeed an Aggregation, and to subsequently access a Resource Map that describes the Aggregation.

As noted, however, an Aggregation is a one of those special Semantic Web resources: dereferencing its URI via an HTTP protocol request does not yield a Representation. This section briefly describes two methods that ORE recommends for gaining access to a Resource Map that describes an Aggregation, given the HTTP URI of that Aggregation. These two methods are based on guidelines from the Semantic Web community that are fully articulated in the Cool URIs for the Semantic Web specification [CoolURIs].

6.1 HTTP 303 Forwarding from the Aggregation URI to the Resource Map URI

This method is appropriate in applications where the party that introduces an Aggregation and a Resource Map that describes it has control over a Web server. It also the recommended approach when Resource Maps in multiple formats, such as both Atom XML and RDF/XML, are published to describe the same Aggregation.

The mechanics of this method are as follows. When the server receives an HTTP request for the Aggregation URI A-1 it returns an HTTP 303 status code (which means "see also") with a redirection to the Resource Map URI ReM-1. The browser or agent may then make a new HTTP request for ReM-1. Requests for URI A-1 can also employ 303 redirection with content negotiation [RFC2616 , CoolURIs] to include redirection to a Resource Map in one of several formats.

Example URIs are:

Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.rdf

and additional serializations may be added following the URI pattern:

               ReM-2 = http://example.org/foo.atom

6.2 The Aggregation URI is a Hash URI

This method does not require that the party that introduces an Aggregation and a Resource Map controls a Web server. The URI of the Aggregation A-1 is constructed by appending a fragment identifier #aggregation to the Resource Map URI ReM-1. Example URIs are:

Aggregation:  A-1   = http://example.org/foo.rdf#aggregation
Resource Map: ReM-1 = http://example.org/foo.rdf

As defined by HTTP [RFC2616], an agent should strip off the fragment identifier before issuing an HTTP request to the server. The result is that the server request is actually to ReM-1. But, by introducing the frament identifier, the URIs A-1 and ReM-1 still identify different Resources as defined by the Architecture of the World Wide Web [Web Architecture].

7. What should you read now?

This primer has briefly introduced concepts and approaches used in the OAI-ORE specifications. Interested readers, especially those intending to implement these specifications, should obtain further details in the following documents.

8. References

[Cool URIs]
Cool URIs for the Semantic Web Leo Sauermann, Richard Cyganiak, Max Völkel, 2007-08-09. Available at http://www.dfki.uni-kl.de/~sauermann/2006/11/cooluris/ . Also being developed into a W3C Working Draft available at http://www.w3.org/TR/cooluris/
[Linked Data Tutorial]
How to Publish Linked Data on the Web, Chris Bizer, Richard Cyganiak, Tom Heath, 2007-07-27. Available at http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/20070727/ . Latest version available at http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ .
[RDFXML]
RDF/XML Syntax Specification (Revised), Dave Beckett and Brian McBrde, Editors. W3C Recommendation, 10 February 2004,
http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/ .
Latest version available at http://www.w3.org/TR/rdf-syntax-grammar/.
[RDF Concepts]
Resource Description Framework (RDF): Concepts and Abstract Syntax, Graham Klyne and Jeremy J. Carroll, Editors, W3C Recommendation, 10 February 2004, Available at http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ . Latest version available at http://www.w3.org/TR/rdf-concepts/ .
[RFC2616]
RFC2616: Hypertext Transfer Protocol - HTTP/1.1, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, IETF RFC2616, June 1999. Available at http://www.ietf.org/rfc/rfc2616.txt
[RFC3986]
RFC3986: Uniform Resource Identifier (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter, IETF RFC3986, January 2005. Available at http://www.ietf.org/rfc/rfc3986.txt
[Web Architecture]
Architecture of the World Wide Web, Volume One, I. Jacobs and N. Walsh, Editors, World Wide Web Consortium, 15 January 2004. Available at http://www.w3.org/TR/webarch/ .

A. Acknowledgements

This document is the work of the Open Archives Initiative. Funding for Open Archives Initiative Object Reuse and Exchange is provided by the Andrew W. Mellon Foundation, Microsoft, and the National Science Foundation.  Additional support is provided by the Coalition for Networked Information.

This document is based on meetings of the OAI-ORE Technical Committee (ORE-TC), with participation from the OAI-ORE Liaison Group (ORE-LG).  Members of the ORE-TC are: Chris Bizer (Freie Universität Berlin), Les Carr (University of Southampton), Tim DiLauro (Johns Hopkins University), Leigh Dodds (Ingenta), David Fulker (UCAR), Tony Hammond (Nature Publishing Group), Pete Johnston (Eduserv Foundation), Richard Jones (Imperial College), Peter Murray (OhioLINK), Michael Nelson (Old Dominion University), Ray Plante (NCSA and National Virtual Observatory), Rob Sanderson (University of Liverpool), Simeon Warner (Cornell University), and Jeff Young (OCLC).  Members of ORE-LG are: Leonardo Candela (DRIVER), Tim Cole (DLF Aquifer and UIUC Library), Julie Allinson (JISC), Jane Hunter (DEST), Savas Parastatidis  (Microsoft), Sandy Payette (Fedora Commons), Thomas Place (DARE and University of Tilburg), Andy Powell (DCMI), and Robert Tansley (Google, Inc. and DSpace)

We also acknowledge comments from the OAI-ORE Advisory Committee (ORE-AC).

B. Change Log

Date Editor Description
2008-10-17 carl/herbert public 1.0 release (complete rewrite)
2008-06-02 simeon public beta 0.9 release
2008-04-03 simeon public alpha 0.3 release
2008-03-02 simeon public alpha 0.2 release
2007-12-10 simeon public alpha 0.1 release
2007-10-15 simeon alpha release to ORE-TC

Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.

Use of this page is tracked to collect anonymous traffic data. See OAI privacy policy.