ORE User Guide - HTTP Implementation and Multiple Serializations

Abstract

Open Archives Initiative Object Reuse and Exchange (OAI-ORE) defines standards for the description and exchange of aggregations of Web resources. This document describes implementation of OAI-ORE using HTTP [RFC2616], the must widely used protocol of the current World Wide Web. Mechanisms that support multiple Resource Maps in different serializations are described in detail. This user guide is one of several documents comprising the OAI-ORE specification and user guide.

1. Introduction

The use of HTTP URIs to identify ORE Aggregations and Resource Maps leverages the extensive infrastructure and tools of the current World Wide Web [Web Architecture]. HTTP is the best supported protocol of current web browsers, crawlers, search engines, feed aggregators, and many other tools. HTTP provides mechanisms that allow the Aggregation to yield or redirect to a Resource Map as required by the ORE Data Model. HTTP is thus the RECOMMENDED protocol and associated URI scheme for ORE Aggregations and Resource Maps.

There may be one or more Resource Maps that describe a particular Aggregation. These will likely differ in their serialization format, serialization specific metadata (e.g. creation time) etc., and are thus separate resources from a Web Architecture standpoint. Each Resource Map should thus have a different URI (ReM-1, ReM-2 etc.) and it is incorrect to make multiple Resource Maps available from a single URI via content negotiation.

In application domains such as scholarly communication, there are already many aggregations of resources on the web. These are often described by HTML "splash pages" such as http://arxiv.org/abs/astro-ph/0601007. which provide description and access to components. Splash pages and the URIs that identify them are NOT ORE Aggregations or Resource Maps. However, with RDFa and microformats it is possible to embed a Resource Map in a splash page and we discuss this case below. If there exists a splash page that does not contain an RDFa or microformat representation of a Resource Map then that page should not be available via content negotiation from the Aggregation.

This document is divided into several sections which describe different implementation scenarios. These scenarios differ in the server requirements needed to support them, and in the URI structure that results. Section 2 describes a clean and extensible implementation strategy requiring some HTTP server support. This is the RECOMMENDED strategy. Section 3 describes a limited but very simple implementation strategy that requires no HTTP server support beyond the ability to serve files. Section 4 describes implementation with RFDa or microformats either alone or in addition to other formats.

Is is correct and/or helpful to use the information resource and non-information resource terminology when describing implementation strategies? Does the type of A-1 change depending on implementation choice?

2. Cool URI implementation with some HTTP server support

This implementation strategy is motivated by the desire to use Cool URIs and to allow easy extensibility to new or additional serializations. We first describe the simple case of one Resource Map available to describe an Aggregation, and the mechanisms used to tie these two URIs together. Section 2.2 then extends this to the case of multiple Resource Maps describing the same Aggregation.

2.1 Cool URIs for one Resource Map describing an Aggregation

Consider the following example Aggregation and Resource Map URIs:

Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.xml

Both A-1 and ReM-1 SHOULD be resolvable. The Resource Map, with URI ReM-1, is an information resource and access SHOULD yield a representation of the Resource Map (in this case an Atom serialization, see [Atom Profile]). The Aggregation, with URI A-1, is described by the Resource Map available from ReM-1 and access to A-1 SHOULD lead a user or agent to the Resource Map. There are two good mechanisms for doing this in HTTP -- content negotiation and redirection:

Content negotiation -- even though there is just one Resource Map available to describe Aggregation A-1, HTTP transparent content negotiation can be used to return the Resource Map from A-1. The mechanism is described in RDF2295 and see also Apache Content Negotation for an example implementation. The key elements of the process are that when a client requests A-1, the server may instead response with a Resource Map. The response MUST include a Content-Location header that indicates that the response is actually from URI ReM-1.
Redirection -- as an alternative to content negotiation, a server may respond to a request for A-1 with a 303 See Other redirect to ReM-1. This strategy is described in the Linked Data Tutorial.

The URIs A-1 and ReM-1 do not have to be related in the manner shown above although this is one common arrangement and is supported by Apache. While appropriate choice for a given system will likely be influenced by other considerations, it should not be forgotten that good URIs do not change [URI Style] and that later expansion is often required as systems evolve.

2.2 Multiple Resource Maps with Cool URIs

If more than one Resource Map is available to describe an Aggregation, perhaps and Atom serialization and an RDF/XML serialization, then each Resource Map SHOULD be available from a different URI. Consider adding ReM-2 to the example above:

Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.xml
               ReM-2 = http://example.org/foo.rdf

The Aggregation and each Resource Map has a good URI, and the scheme is easily extensible for addition resource maps simply by adding new Resource Maps with ReM-3 etc.. It is an implementation decision as to which Resource Map is considered the default. The serialization most useful to a simple web browser is likely the best choice and at present that is Atom if available. Either transparent content negotiation or redirect may be used to handle client accesses to the Aggregation URI.

To aid in discovery, it is RECOMMENDED that where there are multiple Resource Maps available for an Aggregation and this is known when the a Resource Map is generated, the availability of other Resource Maps should be indicated using the ore:isDescribedBy predicate. For example, ReM-1 might include the triples (shown in N3 format):

ReM-1 ore:describes A-1.
A-1 ore:isDescribedBy ReM-2.   #discovery of ReM-2 from ReM-1
A-1 ore:isDescribedBy ReM-3.   #discovery of ReM-3 from ReM-1

3. Simple implementation without server support for content negotiation or redirection

Without support from a web server one cannot use the techniques above to arrange that an attempt to access the Aggregation yields or redirects to a Resource Map. A way around this limitation is to relate the URIs with a fragment identifier [RFC3986]. For example, with URIs:

Aggregation:  A-1   = http://example.org/foo.xml
Resource Map: ReM-1 = http://example.org/foo.xml#rem

The #rem approach to identify the Resource Map leads to clean references to the Aggregation without creating a second URI (once fragment identifier is removed) that must resolve. Another option would be to suggest that in this case A-1 and ReM-1 should be completely different URIs (neither with a fragment identifier). Say A-1 = http://example.org/bar.xml and ReM-1 = http://example.org/foo.xml. ReM-1 must return the Resource Map as described and A-1 should also return a Resource Map (though not necessarily exactly the same one) or some other format with enough information to lead a client to ReM-1. Yet another option would be use a fragment identifier for the Aggregation instead of for the Resource Map. This approach leads to ugly URIs for the Aggregation (which we expect to be linked to) but does provides cleaner migration path from the hash-approach to a Cool URI approach (can assert foo owl:sameAs foo.xml#aggregation when new foo URI for Aggregation is created).

Resolution of fragment identifiers is defined to be a client-side behavior so any client seeing an HTTP URI with fragment identifier, e.g. uri#fragment will remove the #fragment and access uri. Only when a response is obtained might the client try to identify the correct fragment. In practice this means that either A-1 or ReM-1 above will yield a Resource Map at http://example.org/foo.xml.

The fragment identifier permits precise differentiation between the Resource Map and the Aggregation so that statements can be made about the appropriate resource. However, it also satisfies the requirement that a Resource Map can be obtained both via the Aggregation URI A-1 and directly from ReM-1.

The migration path from this approach to more complex solution with multiple serialization is somewhat messy. Later support for multiple serializations can be done in one of two ways:

Change URIs to adopt the Cool URI strategy. This will break existing assertions if A-1 is renamed and the URI http://example.org/foo.xml reused for a Resource Map. However, at least clients would get a Resource Map back so a sufficiently smart client might be able to unravel the inconsistency. After, migration to support both the original http://example.org/foo.xml Atom Resource Map and a new http://example.org/foo.rdf RDF/XML Resource Map the set of URIs might be:
```
Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.xml
               ReM-2 = http://example.org/foo.rdf
```
Preserve existing URIs while adding other formats. This leads to a rather ugly and non-standard set of URIs which would require custom server support. If we add a new http://example.org/foo.rdf RDF/XML Resource Map the set of URIs might be:
```
Aggregation:   A-1   = http://example.org/foo.xml
Resource Map:  ReM-1 = http://example.org/foo.xml#rem
               ReM-2 = http://example.org/foo.rdf
```

It would be possible to extend the fragment identifier scheme described in combination with content negotiation to handle multiple serializations. However, this would go against standard web practices and is NOT RECOMMENDED. The Multiple Resource Maps with Cool URIs strategy is a much better approach.

4. Implementation with RDFa or microformats

RDFa and microformats provide means to include structured data, such as a Resource Map, within an XHTML or HTML page. A profile for use of RDFa to serialize Resource Maps is given in Representing Resource Maps Using RDF Syntaxes. With RDFa and microformats an HTML "splash page" may also take on the dual role of a Resource Map serialization.

4.1 RDFa or microformats with Cool URIs

In the case of a Cool URI implementation, the (X)HTML page with the RDFa or microformat then the URI of this page (and Resource Map) is treated in the same way as any other Resource Map URI for a given Aggregation. If the HTML page contains the only Resource Map serialization then one might have URIs:

Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.html   (includes RDFa Resource Map)

If there are multiple serializations then the default content-negotiated result or redirect should be to the HTML page. This will ensure that a web browser receives the most helpful version of the Resource Map in response to an attempt to access the Aggregation with no preference information. If Resource Maps were available in XHTML/RDFa, Atom and RDF/XML the URIs might be:

Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.html   (includes RDFa Resource Map)
               ReM-2 = http://example.org/foo.xml
               ReM-3 = http://example.org/foo.rdf

4.2 RDFa or microformats without server support

In case of a simple implementation without server support, the (X)HTML page containing the RDFa or microformat Resource Map serialization must have the Aggregation URI A-1:

Aggregation:   A-1   = http://example.org/foo.html
Resource Map:  ReM-1 = http://example.org/foo.html#rem

The RDFa or microformat data must be written so that the URIs above are used in statements. The Resource Map URI is http://example.org/foo.html#rem and not the page URI http://example.org/foo.html. This is possible using the Evaluation Context notion in RDFa.

5. References

[Apache Content Negotation]: Apache HTTP Server Version 2.2 - Content Negotiation, The Apache Software Foundation, 2008. Available at http://httpd.apache.org/docs/2.2/content-negotiation.html
[Atom Profile]: ORE Specification - Resource Map Profile of Atom, Carl Lagoze, Herbert Van de Sompel, Pete Johnston, Michael Nelson, Robert Sanderson, Simeon Warner (editors), 2008-02-28. Available at http://www.openarchives.org/ore/0.3/atom
[Cool URIs]: Cool URIs for the Semantic Web Leo Sauermann, Richard Cyganiak, Max Völkel, 2007-08-09. Available at http://www.dfki.uni-kl.de/~sauermann/2006/11/cooluris/ . Also being developed into a W3C Working Draft at http://www.w3.org/TR/cooluris/
[Data Model]: ORE Specification - Abstract Data Model, Carl Lagoze, Herbert Van de Sompel, Pete Johnston, Michael Nelson, Robert Sanderson, Simeon Warner (editors), 2008-02-26. Available at http://www.openarchives.org/ore/0.3/datamodel
[Linked Data Tutorial]: Linked Data Tutorial, Chris Bizer, Richard Cyganiak, Tom Heath, 2007-07-27. Available at http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
[RFC2295]: IETF RFC 2295: Transparent Content Negotiation, K. Holtman, A. Mutz, 1998-03. Available at http://www.ietf.org/rfc/rfc2295.txt
[RFC2616]: IETF RFC 2616: Hypertext Transfer Protocol - HTTP/1.1, R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, June 1999. Available at http://www.ietf.org/rfc/rfc2616.txt
[RFC3986]: IETF RFC 3986: Uniform Resource Identifier (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter. Available at http://www.ietf.org/rfc/rfc3986.txt
[RDFa]: RDFa in XHTML: Syntax and Processing. A collection of attributes and processing rules for extending XHTML to support RDF, Ben Adida, Mark Birbeck, Shanne McCarron and Steven Pemberton (editors). W3C Working Draft, 2007-10-18,
Available at http://www.w3.org/TR/2007/WD-rdfa-syntax-20071018/ and latest version available at http://www.w3.org/TR/rdfa-syntax/.
[RDF Resource Maps]: ORE User Guide - Representing Resource Maps Using RDF Syntaxes, Carl Lagoze, Herbert Van de Sompel, Pete Johnston, Michael Nelson, Robert Sanderson, Simeon Warner (editors), 2008-02-29. Available at http://www.openarchives.org/ore/0.3/rdfsyntax
[URI Style]: Cool URIs don't change, Tim Berners-Lee, 1998. Available at http://www.w3.org/Provider/Style/URI
[Web Architecture]: Architecture of the World Wide Web, Volume One, I. Jacobs and N. Walsh, Editors, World Wide Web Consortium, 15 January 2004. Available at http://www.w3.org/TR/webarch/