ORE User Guide - HTTP Implementation and Multiple Serializations

Abstract

Open Archives Initiative Object Reuse and Exchange (OAI-ORE) defines standards for the description and exchange of aggregations of Web resources. This document describes implementation of OAI-ORE using HTTP [RFC2616], the must widely used protocol of the current World Wide Web. Mechanisms that support multiple Resource Maps in different serializations are described in detail. This user guide is one of several documents comprising the OAI-ORE specification and user guide.

1. Introduction

The use of HTTP URIs to identify ORE Aggregations and Resource Maps leverages the extensive infrastructure and tools of the current World Wide Web [Web Architecture]. HTTP is the best supported protocol of current web browsers, crawlers, search engines, feed aggregators, and many other tools. HTTP provides mechanisms that allow the Aggregation, which is a non-information resource in the sense of the Web Architecture, to yield or redirect to a Resource Map as required by the ORE Model [Data Model]. HTTP is thus the RECOMMENDED protocol and associated URI scheme for ORE Aggregations and Resource Maps.

There may be one or more Resource Maps that describe a particular Aggregation. These will likely differ in their serialization format and serialization specific metadata (e.g. creation time), and are thus separate resources from a Web Architecture standpoint. Each Resource Map should have a different URI (ReM-1, ReM-2, etc.) and it is incorrect to make multiple Resource Maps available from a single URI via content negotiation.

In application domains such as scholarly communication, there are already many aggregations of resources on the web. These are often described by HTML "Splash Pages" such as http://arxiv.org/abs/astro-ph/0601007. which provide description of an the aggregation and access to components. Splash Pages and the URIs that identify them are NOT ORE Aggregations or Resource Maps. However, with RDFa and microformats it is possible to embed a Resource Map in a Splash Page and we discuss this case below. If there exists a Splash Page that does not contain an RDFa or microformat representation of a Resource Map then that page should not be available via content negotiation from the Aggregation.

This document is divided into sections which describe different HTTP implementation scenarios. These scenarios differ in the server requirements needed to support them, and in the URI structure that results. Section 2 describes a clean and extensible implementation strategy requiring some HTTP server support. This is the RECOMMENDED strategy. Section 3 describes a limited but very simple implementation strategy that requires no HTTP server support beyond the ability to serve files. Section 4 describes implementation with RFDa or microformats either alone or in addition to other formats. Finally, section 5 gives the recommended behaviour of HTTP Proxy URIs and details of the ORE Proxy URI resolver at http://oreproxy.org/r.

2. Cool URI implementation with some HTTP server support

This implementation strategy is motivated by the desire to use Cool URIs and to allow easy extensibility to new or additional serializations. We first consider the simple case of one Resource Map available to describe an Aggregation, and the mechanisms used to tie the Aggregation and Resource Map resources together. Section 2.2 then extends this to the case of multiple Resource Maps describing the same Aggregation.

2.1 Cool URIs for one Resource Map describing an Aggregation

Consider the following example Aggregation and Resource Map URIs:

Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.atom

Both A-1 and ReM-1 SHOULD be resolvable. The Resource Map, with URI ReM-1, is an information resource and access SHOULD yield a representation of the Resource Map (in this case an Atom serialization, see [Atom Resource Maps]). The Aggregation, with URI A-1, is described by the Resource Map available from ReM-1 and access to A-1 SHOULD lead a user or agent to the Resource Map. There are two good mechanisms for doing this in HTTP -- content negotiation and redirection:

Content negotiation -- even though there is just one Resource Map available to describe Aggregation A-1, HTTP transparent content negotiation can be used to return the Resource Map from A-1. The mechanism is described in RDF2295 (see also Apache Content Negotation for an example implementation). The key elements of the process are that when a client requests A-1, the server may instead respond with a Resource Map. The response MUST include a Content-Location header that indicates that the response is actually from URI ReM-1.
Redirection -- as an alternative to content negotiation, a server may respond to a request for A-1 with a 303 See Other redirect to ReM-1. This strategy is described in the Linked Data Tutorial.

The URIs A-1 and ReM-1 do not have to be related in the manner shown above although this is one common arrangement and is supported by Apache. While the appropriate choice for a given system will likely be influenced by other considerations, it should not be forgotten that "good URIs do not change" [URI Style] and that later expansion is often required as systems evolve.

2.2 Multiple Resource Maps with Cool URIs

If more than one Resource Map is available to describe an Aggregation, perhaps and Atom serialization and an RDF/XML serialization, then each Resource Map SHOULD be available from a different URI. Consider adding ReM-2 to the example above:

Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.atom
               ReM-2 = http://example.org/foo.rdf

The Aggregation and each Resource Map has a good URI, and the scheme is easily extensible for addition resource maps simply by adding new Resource Maps with URI ReM-3 etc.. It is an implementation decision as to which Resource Map is considered the default. The serialization most useful to a simple web browser is likely the best choice and at present that is Atom if available. Either transparent content negotiation or redirection may be used to handle client accesses to the Aggregation URI.

To aid in discovery, it is RECOMMENDED that where there are multiple Resource Maps available for an Aggregation and this is known when the a Resource Map is generated, the availability of other Resource Maps should be indicated using the ore:isDescribedBy predicate. For example, ReM-1 might include the triples (shown in N3 format):

ReM-1 ore:describes A-1.
A-1 ore:isDescribedBy ReM-2.   #discovery of ReM-2 from ReM-1
A-1 ore:isDescribedBy ReM-3.   #discovery of ReM-3 from ReM-1

3. Simple implementation without server support for content negotiation or redirection

Without support from a web server one cannot use the techniques above to arrange that an attempt to access the Aggregation yields or redirects to a Resource Map. A way around this limitation is to relate the URIs with a fragment identifier [RFC3986]. For example, the URIs might be:

Aggregation:  A-1   = http://example.org/foo.atom#aggregation
Resource Map: ReM-1 = http://example.org/foo.atom

Resolution of fragment identifiers is defined to be a client-side behavior so any client seeing an HTTP URI with fragment identifier, e.g. uri#fragment will remove the #fragment and access uri. Only when a response is obtained might the client try to identify the correct fragment. In practice this means that either A-1 or ReM-1 above will yield the Resource Map at http://example.org/foo.atom. The use of a URI with fragment identifier to identify a non-information resource, such as the Aggregation, is discussed further in the [Linked Data Tutorial, Cool URIs].

Use of a fragment identifier permits precise differentiation between the Resource Map and the Aggregation so that statements can be made about the appropriate resource. It also satisfies the requirement that a Resource Map can be obtained both via the Aggregation URI A-1 and directly from ReM-1.

3.1 Migration from a simple implementation to support multiple Resource Maps

The use of a fragment identifier for the Aggregation URI does not directly support the availability of multiple Resource Maps for a single Aggregation. Migration from this simple approach to more complex solution with multiple serializations can be accomplished in two ways:

Change URIs to adopt the Cool URI strategy. There is no need to change the URI for original the Resource Map http://example.org/foo.atom. An additional Resource Map may be added at a new URI, say an RDF/XML Resource Map at http://example.org/foo.rdf, to give the following set of URIs:
```
Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.atom
               ReM-2 = http://example.org/foo.rdf
```
With the new URI arrangement, clients attempting to access the old Aggregation URI http://example.org/foo.atom#aggregation would still find a Resource Map and a sufficiently smart client might be able to unravel the inconsistency that there is no description of the resource http://example.org/foo.atom#aggregation. However, the process may be made explicit by updating the Resource Maps to include a statement that http://example.org/foo.atom and http://example.org/foo.atom#aggregation identify the same resource:
```
<http://example.org/foo>  owl:sameAs  <http://example.org/foo.atom#aggregation>.
```
Preserve existing URIs while adding other formats. This leads to a rather ugly and non-standard set of URIs but is otherwise straightforward. If a new RDF/XML Resource Map were added at http://example.org/foo.rdf the set of URIs might be:
```
Aggregation:   A-1   = http://example.org/foo.atom#aggregation
Resource Map:  ReM-1 = http://example.org/foo.atom
               ReM-2 = http://example.org/foo.rdf
```

It would be possible to extend the fragment identifier scheme described in combination with content negotiation to handle multiple serializations. However, this would go against standard web practices and is NOT RECOMMENDED. The Multiple Resource Maps with Cool URIs strategy is a much better approach.

4. Implementation with RDFa or microformats

RDFa and microformats provide means to include structured data, such as a Resource Map, within an XHTML or HTML page. A profile for use of RDFa to serialize Resource Maps is given in Resource Map Implementation in RDFa. With RDFa and microformats an (X)HTML "Splash Page" may also take on the dual role of a Resource Map serialization.

Within the ORE Model, the URIs of all Resource Maps (ReM-1, ReM-2 etc.) MUST be distinct from the URI of the Aggregation (A-1). Similarly the URI of a Splash Page (S-1) MUST be distinct from the URI of the Aggregation. It is RECOMMENDED that the URI of a Splash Page also be distinct from the URI of the Resource Map if the Splash Page is itself an Aggregated Resource. Suggested ways to do this are included in sections 4.1 and 4.2 below.

4.1 RDFa or microformats with Cool URIs

In the case of a Cool URI implementation, the (X)HTML page with the RDFa or microformat then the URI of this page (and Resource Map) is treated in the same way as any other Resource Map URI for a given Aggregation. If the HTML page contains the only Resource Map serialization then one might have URIs:

Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.html      (includes RDFa Resource Map)

If there are multiple serializations then the default content-negotiated result or redirect should be to the HTML page. This will ensure that a web browser receives the most helpful version of the Resource Map in response to an attempt to access the Aggregation with no preference information. If Resource Maps were available in XHTML/RDFa, Atom and RDF/XML the URIs might be:

Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.html      (includes RDFa Resource Map)
               ReM-2 = http://example.org/foo.atom
               ReM-3 = http://example.org/foo.rdf

If the (X)HTML or Splash Page is itself part of the Aggregation then the Splash Page and Resource Map URIs should be different. In example set of URIs below, the fragment identifier #rem is used to distinguish the Resource Map from the Splash Page:

Splash Page:   S-1   = http://example.org/foo.html
Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.html#rem  (includes RDFa Resource Map)
               ReM-2 = http://example.org/foo.atom
               ReM-3 = http://example.org/foo.rdf

Alternatively, the server could be configured to support completely separate URIs S-1 and ReM-1 that yield the same XHTML+RDFa or XHTML+microformat document:

Splash Page:   S-1   = http://example.org/splash.html   (access yields same XHTML+RDFa as foo.html)
Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.html      (access yields same XHTML+RDFa as splash.html)
               ReM-2 = http://example.org/foo.atom
               ReM-3 = http://example.org/foo.rdf

4.2 RDFa or microformats without server support

In case of a simple implementation without server support, the (X)HTML page containing the RDFa or microformat Resource Map serialization must have the Aggregation URI A-1:

Aggregation:   A-1   = http://example.org/foo.html#aggregation
Resource Map:  ReM-1 = http://example.org/foo.html

The RDFa or microformat data must be written so that the URIs above are used in statements. The Aggregation URI is http://example.org/foo.html#aggregation and not the page URI http://example.org/foo.html.

Splash Page:   S-1   = http://example.org/foo.html
Aggregation:   A-1   = http://example.org/foo.html#aggregation
Resource Map:  ReM-1 = http://example.org/foo.html#rem

5. Proxy URIs

The ORE Model [Data Model] introduces Proxy URIs which establish Aggregation-specific identities for Aggregated Resources. From a modelling perspective, Proxy URIs need only be unique to a specific Aggregation and to a specific Aggregated Resource, and have these connections indicated with the appropriate predicates (ore:proxyIn, ore:proxyFor). It is permitted to have multiple Proxy URIs for the same Aggregated Resource in the same Aggregation as described in different Resource Maps. When implemented using HTTP, Proxy URIs SHOULD satisfy the additional requirements given below so that clients dereferencing a Proxy URI will be redirected to the Aggregated Resource while also being informed of the Aggregation context. Conveying this information in responses requires server support.

The ORE Proxy URI resolver provides a way to implement Proxy URIs without the need for local server support. Proxy URIs are constructed as queries to the resolver which contain both the target Aggregated Resource URI and Aggregation context URI.

5.1 Requirements for HTTP Proxy URIs

Proxy URIs MUST be unique to a specific Aggregation (URI-A) and to a specific Aggregated Resource (URI-AR). They are thus able to "stand for" the Aggregated Resource in the context of the particular Aggregation. If an HTTP Proxy URI is used as a reference to an Aggregated Resources in the context of an Aggregation then it is desirable that dereferencing it with a standard web browser will return the Aggregated Resource itself (say a JPEG image or PDF document). In addition, dereference of the Proxy URI by an ORE aware client or agent should reveal the Aggregation context. In order to meet these two requirements, when dereferenced HTTP Proxy URIs MUST:

Redirect the client to the Aggregated Resource with HTTP status code "303 See Other" (other 3xx status codes do not have the correct semantics) and a Location header:
```
Location: URI-AR
```
Indicate the Aggregation context in the HTTP response with the Link header which it typed with the aggregation relation:
```
Link: <URI-A>; rel="aggregation"
```

The ORE Proxy URI resolver is one implementation that meets these requirements. The particular syntax described below could be reused for other Proxy URI resolvers with different base URIs. With this or other syntaxes, implementers should note the URI encoding issues mentioned below.

5.2 ORE Proxy URI resolver at `http://oreproxy.org/r`

The ORE Proxy URI resolver at http://oreproxy.org/r is provided as a service to the community. Use of the http://oreproxy.org/r resolver requires only that Proxy URIs are constructed by following the syntax rules described here. There is no need to register new Proxy URIs or Resource Maps or Aggregations because all of the information needed to implement the Proxy URI requirements given above is included in the Proxy URI itself. Namely, the URIs of the Aggregated Resource (URI-A) and the Aggregation (URI-A) context. The syntax for the Proxy URI is:

http://oreproxy.org/r?what=URI-AR&where=URI-A

and an example might be

http://oreproxy.org/r?what=http://example.org/aggregated_resource_456&where=http://example.org/aggregation_123

Proxy URIs are constructed according to the following rules:

The parameters what and where MUST be given in the order shown.
The URIs of the Aggregated Resource (URI-AR) and of the Aggregation (URI-A) MUST be appropriately URI encoded as parts of the query component of the Proxy URI. All except the following characters should be percent encoded in URI-A and URI-AR when used in the Proxy URI (see URI syntax specification [RFC3986]):
```
query-non-escaped = ALPHA / DIGIT / "-" / "." / "_" / "~" / ":" / "@" / "/" / "?"
```
Note that this means that there MUST be double-escaping of any % characters that are already used to indicated percent encoded characters in URI-A or URI-AR. For example, if URI-AR=http://example.org/aggregated%26resource and URI-A=http://example.org/aggregation_123, the % in %26 must be encoded as %25, giving:
```
http://oreproxy.org/r?what=http://example.org/aggregated%2526resource&where=http://example.org/aggregation_123
```
Note also that it is essential that the # character be correctly escaped (as %23) if either URI-A or URI-AR contain a fragment identifier component. If not, a browser would interpret the # character as the end of the query string and not sent the rest of the proxy URI to the resolver.
It is RECOMMENDED that the scheme and host components be specified in normalized (lowercase) form.

All applications except the application creating the Proxy URI and the resolver SHOULD treat the Proxy URI as opaque.

When a client dereferences a http://oreproxy.org/r Proxy URI it will be redirected to the Aggregated Resource (URI-A) and the Aggregation context will be indicated in an HTTP Link header as described in the Proxy URI requirements above. Clients that cannot or do not interpret the Link header, such as an ordinary web browser, will silently be redirected to the Aggregated Resource. ORE aware clients will be able to deduce the Aggregation context.

6. References

[Apache Content Negotation]: Apache HTTP Server Version 2.2 - Content Negotiation, The Apache Software Foundation, 2008. Available at http://httpd.apache.org/docs/2.2/content-negotiation.html
[Atom Resource Maps]: ORE Specification - Representing Resource Maps Using the Atom Syndication Format, Carl Lagoze, Herbert Van de Sompel, Pete Johnston, Michael Nelson, Robert Sanderson, Simeon Warner (editors), 2008-06-02. Available at http://www.openarchives.org/ore/0.9/atom
[Cool URIs]: Cool URIs for the Semantic Web Leo Sauermann, Richard Cyganiak, Max Völkel, 2007-08-09. Available at http://www.dfki.uni-kl.de/~sauermann/2006/11/cooluris/ . Also being developed into a W3C Working Draft at http://www.w3.org/TR/cooluris/
[Data Model]: ORE Specification - Abstract Data Model, Carl Lagoze, Herbert Van de Sompel, Pete Johnston, Michael Nelson, Robert Sanderson, Simeon Warner (editors), 2008-06-02. Available at http://www.openarchives.org/ore/0.9/datamodel
[Linked Data Tutorial]: Linked Data Tutorial, Chris Bizer, Richard Cyganiak, Tom Heath, 2007-07-27. Available at http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
[RFC2295]: IETF RFC 2295: Transparent Content Negotiation, K. Holtman, A. Mutz, 1998-03. Available at http://www.ietf.org/rfc/rfc2295.txt
[RFC2616]: IETF RFC 2616: Hypertext Transfer Protocol - HTTP/1.1, R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, June 1999. Available at http://www.ietf.org/rfc/rfc2616.txt
[RFC3986]: IETF RFC 3986: Uniform Resource Identifier (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter. Available at http://www.ietf.org/rfc/rfc3986.txt
[RDFa]: RDFa in XHTML: Syntax and Processing. A collection of attributes and processing rules for extending XHTML to support RDF, Ben Adida, Mark Birbeck, Shanne McCarron and Steven Pemberton (editors). W3C Working Draft, 2007-10-18. Available at http://www.w3.org/TR/2007/WD-rdfa-syntax-20071018/ and latest version available at http://www.w3.org/TR/rdfa-syntax/
[RDFa Resource Maps]: ORE User Guide - Resource Map Implementation in RDFa, Carl Lagoze, Herbert Van de Sompel, Pete Johnston, Michael Nelson, Robert Sanderson, Simeon Warner (editors), 2008-06-02. Available at http://www.openarchives.org/ore/0.9/rdfa
[URI Style]: Cool URIs don't change, Tim Berners-Lee, 1998. Available at http://www.w3.org/Provider/Style/URI
[Web Architecture]: Architecture of the World Wide Web, Volume One, I. Jacobs and N. Walsh, Editors, World Wide Web Consortium, 15 January 2004. Available at http://www.w3.org/TR/webarch/

Date	Editor	Description
2008-06-02	simeon	public beta 0.9 release
2008-04-02	simeon	public alpha 0.3 release