ResourceSync Framework Specification - Notification - Beta Draft

24 March 2014

This version:
http://www.openarchives.org/rs/notification/0.9/notification
Latest version:
http://www.openarchives.org/rs/notification
Previous version:
http://www.openarchives.org/rs/notification/0.8.1/notification
Editors:
Martin Klein, Robert Sanderson, Herbert Van de Sompel - Los Alamos National Laboratory
Simeon Warner - Cornell University
Graham Klyne - University of Oxford
Bernhard Haslhofer - University of Vienna
Michael Nelson - Old Dominion University
Carl Lagoze - University of Michigan

Abstract

The ResourceSync core specification introduces a pull-based synchronization framework for the web that consists of various capabilities that a Source can implement to allow Destinations to remain synchronized with its evolving resources. This ResourceSync notification specification describes two additional, push-based, capabilities that a Source can support. Both are aimed at reducing synchronization latency and entail a Source sending notifications to subscribing Destinations.

This specification is one of several documents comprising the ResourceSync Framework Specifications.

Status of this Document

This specification is an beta draft released for public comment. The current choice for the transport protocol for notifications is PubSubHubbub. Other protocols, specifically WebSockets, are also under investigation. Feedback is most welcome on the ResourceSync Google Group.

Table of Contents

1. Introduction
    1.1 Motivating Examples
    1.2 Notational Conventions
2. Notification Types and Channels
    2.1 Notification Change Types
3. Change Notification
4. Framework Notification
5. Transport Protocol: PubSubHubbub
    5.1 Source Submits Notifications to Hub
    5.2 Destination Subscribes to Hub to Receive Notifications
    5.3 Dub Delivers Notifications to Destination
    5.4 Destination Unsubscribes from Hub
6. Advertising Notification Channels: PubSubHubbub
7. References

Appendices

A. Acknowledgements
B. Change Log

1. Introduction

This specification describes notification capabilities defined for the ResourceSync framework. The push-based notification capabilities are aimed at decreasing the synchronization latency between a Source and a Destination that is inherent in the pull-based capabilities defined in the ResourceSync core specification. Two notification capabilities are specified and both entail a Source sending notifications to subscribing Destinations. The Change Notification capability consists of a Source sending notifications about changes to its resources. The Framework Notification capability consists of a Source sending out notifications about changes to its implementation of the ResourceSync framework, for example the publication of a new Resource List or the updating of a Change List.

1.1. Motivating Examples

Applications based on Linked Data integrate resources from various datasets, with resources likely changing at a different pace. The BBC Linked Data applications that integrate data from, among others, Last.FM, DBpedia, MusicBrainz, and GeoNames serve as examples. The accuracy of services based on such an integrated resource collection depends on the contributing resources being up-to-date. The update frequency of LiveDBPedia resources, for example, has been observed to average around two changes per second. This provides a significant synchronization challenge that the Change Notification capability aims to address.

While the pull-based capabilities specified in the ResourceSync core specification allow Destinations to remain informed about the evolving state of a Source's resources, they do leave the question open as to when a Destination should check whether, for example, a Source has published a new Resource List or has updated a Change List. A pragmatic solution is for Destinations to recurrently poll a Source at a frequency that is based on experience with the pace of prior updates. The Framework Notification capability is about informing Destinations about changes to a Source's ResourceSync environment, thereby providing an explicit trigger to poll a Source, and in doing so removing uncertainty and optimizing the synchronization process. The efficiency gain of this approach is particularly significant in the case of a Source with infrequent changes where Destinations nonetheless require low latency updates.

1.2. Definitions and Namespace Prefix Bindings

This specification uses the terms "resource", "representation", "request", "response", "content negotiation", "client", and "server" as described in [Architecture of the World Wide Web].

Throughout this document, the following namespace prefix bindings are used:

PrefixNamespace URIDescription
nonehttp://www.sitemaps.org/schemas/sitemap/0.9 Sitemap XML elements defined in the Sitemap protocol
rshttp://www.openarchives.org/rs/terms/ Namespace for elements and attributes introduced in this specification

Table 1.1: Namespace prefix bindings used in this document

2. Notification Types and Channels

Notifications are applied at two distinct levels in the ResourceSync framework:

  1. Change Notifications are sent to inform Destinations about resource change events, for example, when a Source's resource that is subject to synchronization was created, updated, or deleted. Details are provided in Section 3.
  2. Framework Notifications are sent to inform Destinations about changes to capabilities of the ResourceSync framework, for example, if a Source's Change List or Capability List was created, updated, or deleted. Details are provided in Section 4.
As is the case with all capabilities in the ResourceSync framework, the Change Notification and Framework Notification capabilities are independent and can be implemented in a modular manner.

Notifications are sent from Source to Destination using a push technology. Change Notifications and Framework Notifications are sent on different notification channels. The payload for these notifications are described in Section 3 and Section 4, respectively. The transport protocol used to send notifications is discussed in Section 5.

Figure 1 displays the structure of the ResourceSync framework for a Source that has a single set of resources, showing the Source Description and the Capability List at the top. The Capability List advertises four distinct capabilities: a Resource List, a Change List, a Resource Dump, and a Change Dump. The figure also shows a Framework Notification channel (red hexagon) and a Change Notification channel (yellow hexagon) and indicates the levels of the framework they apply to:

A Framework- and a Change Notification channel

Figure 1: A Framework- and a Change Notification channel in the ResourceSync framework structure

The ResourceSync framework allows a Source to offer multiple sets of resources in which case the Source Description points to multiple Capability Lists, one for each set of resources. This scenario has the following implications for the notification capabilities:

Figure 2 depicts a scenario where a Source offers multiple sets of resources and its Source Description therefore points to multiple Capability Lists, one for each set of resources, in this case Capability List 1 and Capability List 2. Figure 2 shows that each set of resources has a designated Change Notification and Framework Notification channel. Change Notification channel 1, for example, is used to send change notifications about changes to resources that are part of the set of resources covered by Capability List 1. In contrast, Framework Notification channel 2 is used to send notifications about changes to the capability documents advertised by Capability List 2 and about changes to Capability List 2 itself. Notifications about changes to the Source Description are sent via Framework Notification channel 1 and Framework Notification channel 2.

Framework and Change Notification channels for multiple Capability Lists

Figure 2: Framework and Change Notification channels for multiple sets of resources

2.1. Notification Change Types

The following table provides an overview of the possible change types that Change Notifications and Framework Notifications inform about within the ResourceSync framework.

CapabilityChange Type
 CreateUpdateDelete
Change Notification
    Individual ResourceXXX
Framework Notification
    Resource ListXX
    Resource DumpX
    Change ListXX
    Change DumpXX
    Capability ListXXX
    Source DescriptionXXX

Table 2.1: Notification Change Types

Note that the creation and deletion of Change Notification channels and Framework Notification channels is reflected in updated Capability Lists (see Section 6). This specification does not define a separate notification about notification channels.

3. Change Notification

A change notification is sent on the appropriate Change Notification channel, as described in Section 2, if a Source wishes to notify a Destination that one or more of its resources subject to synchronization have changed. By subscribing to a Change Notification channel, a Destination can reduce synchronization latency and avoid periodically polling the Source's Change Lists to determine whether resource changes have occurred.

The format of a change notification is very similar to the Change List format introduced in Section 10 of the core specification except that it contains entries only for those changes that have occurred since the previous change notification was sent on the channel. It is based on the <urlset> document format introduced by the Sitemap protocol. It has the <urlset> root element and the following structure:

Change notifications do not use the <sitemapindex> document format introduced by the Sitemap protocol. In the event that there are a very large number of simultaneous changes at a Source, the notifications must be split into a sequence of change notifications using <urlset> documents.

Example 3.1 shows the payload of a change notification containing the description of changes to two resources.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:rs="http://www.openarchives.org/rs/terms/">
   <url>
      <loc>http://example.com/res1</loc>
      <lastmod>2013-01-02T13:03:00Z</lastmod>
      <rs:md change="updated"
             hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6"
             length="8876"
             type="text/html"/>
   </url>
   <url>
      <loc>http://example.com/res2</loc>
      <lastmod>2013-01-02T13:23:00Z</lastmod>
      <rs:md change="updated"
             hash="md5:1e0d5cb8ef6ba40c99b14c0237be735e 
             sha-256:854f61290e2e197a11bc91063afce22e43f8ccc655237050ace766adc68dc784"
             length="14599"
             type="application/pdf"/>
   </url>
</urlset>

Example 3.1: The payload of a change notification

4. Framework Notification

A framework notification is sent on the appropriate Framework Notification channel, as described in Section 2, if a Source wishes to notify a Destination about changes to Resource Lists, Change Lists, Resource Dumps, Change Dumps, Capability Lists, and Source Descriptions. By subscribing to a Framework Notification channel, Destinations can refrain from periodically pulling these documents to determine whether they changed.

The format of a framework notification is very similar to the Change List format introduced in Section 10 of the core specification. It is based on the <urlset> document format introduced by the Sitemap protocol. It has the <urlset> root element and the following structure:

Framework notifications do not use the <sitemapindex> document format introduced by the Sitemap protocol.

Example 4.1 shows the payload of a framework notification informing the Destination about the availability of a new Resource List.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:rs="http://www.openarchives.org/rs/terms/">
   <url>
      <loc>http://example.com/resourceset1/resourcelist.xml</loc>
      <rs:md change="created"
             capability="resourcelist"/>
   </url>
</urlset>

Example 4.1: The payload of a framework notification that informs about the availability of a new Resource List

As shown in Figure 1 and Figure 2, framework notifications are never sent at the index level. If the Source sends a framework notification about the change to a document (e.g., a Resource List) that resides under an index, it must provide a <rs:ln> child element to the <url> element in which that change is communicated. The relation type of that link must be index, and the target of the link must be the index (e.g., the Resource List Index) that the changed document resides under.

It is likely that framework notifications only contain information about a single change to the framework. However, multiple such changes can be aggregated into a single framework notification. Example 4.2 shows the payload of a framework notification informing the Destination about a new Resource List, a new Resource Dump, and about an updated Change List. The Resource List resides under an index and hence the corresponding <url> element has a <rs:ln> child element with the relation type index. Note that the framework notification only contains one entry for one new Resource List that resides under an index even though the index likely points to other new Resource Lists.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:rs="http://www.openarchives.org/rs/terms/">
   <url>
      <loc>http://example.com/dataset1/resourcelist.xml</loc>
      <rs:md change="created"
             capability="resourcelist"/>
      <rs:ln rel="index"
             href="http://example.com/dataset1/resourcelist-index.xml"/>
   </url>
   <url>
      <loc>http://example.com/dataset1/resourcedump.xml</loc>
      <rs:md change="created"
             capability="resourcedump"/>
   </url>
   <url>
      <loc>http://example.com/dataset1/changelist.xml</loc>
      <rs:md change="updated"
             capability="changelist"/>
   </url>
</urlset>

Example 4.2: A framework notification informing about multiple framework changes

5. Transport Protocol: PubSubHubbub

In order to bootstrap the notification capabilities of the ResourceSync framework, a single transport protocol is chosen: PubSubHubbub [PubSubHubbub]. PubSubHubbub is a simple, HTTP-based publish/subscribe protocol that is expected to perform well for framework notifications and for use cases that do not require change notifications to be sent at a very high frequency. Another transport protocol, WebSockets RFC 6455 will be explored and may be added in the future.

Table 5.1 maps terminology used in ResourceSync and PubSubHubbub. In order to implement the publish/subscribe paradigm, PubSubHubbub introduces a hub that acts as a conduit between Source and Destination. A hub can be operated by the Source itself or by a third party. It is uniquely identified by the hub URI. PubSubHubbub's topic corresponds with the notion of channel used in this specification. A topic is uniquely identified by its topic URI. Hence, per set of resources, the Source has a dedicated topic (and hence topic URI) for its change notifications and framework notifications, respectively.

ResourceSyncPubSubHubbub
Source Publisher
Destination Subscriber
Channel Topic
Notification Notification
Hub

Table 5.1: Mapping of terminologies between ResourceSync and PubSubHubbub

The remainder of this section describes the use of PubSubHubbub in ResourceSync. It only provides the information about the PubSubHubbub protocol that is essential to gain an adequate understanding of the overall mechanism. Details about the PubSubHubbub protocol are available in the specification [PubSubHubbub]. Figure 3 shows an overview of HTTP interactions between Source, Hub, and Destination. They will be detailed in the remainder of this section.

Framework and Change Notification channels for multiple Capability Lists

Figure 3: HTTP interactions between Source, Hub, and Destination

5.1. Source Submits Notifications to Hub

The PubSubHubbub protocol provides no specific guidelines regarding the way in which a Source should communicate notifications to a hub. The mechanism for ResourceSync notifications is as follows:

Example 5.1 shows the HTTP POST issued by the Source against its hub to submit the change notification payload of Example 3.1. For brevity, the payload is not shown in its entirety. The third party hub URI is http://hub.example.org/pubsubhubbub/ and the Source's topic URI (channel) for change notifications pertaining to dataset1 is http://example.com/dataset1/change/.

POST /pubsubhubbub/ HTTP/1.1
Host: http://hub.example.org
Content-Type: application/xml
Link: <http://example.com/dataset1/change/> ; rel="self",
 <http://hub.example.org/pubsubhubbub/> ; rel="hub", 
 <http://www.example.com/dataset1/capabilitylist.xml> ; rel="resourcesync"
Content-Length: 849

<?xml version="1.0" encoding="UTF-8"?>
<urlset ...

Example 5.1: The HTTP POST used by a Source to submit a change notification payload to its hub

5.2. Destination Subscribes to Hub to Receive Notifications

A Destination subscribes to a Source's topic using the process described in the section "Subscribing and Unsubscribing" of PubSubHubbub. The process consists of mandatory subscription request and subscription verification phases:

Example 5.2 shows the HTTP POST issued by a Destination against the hub URI http://hub.example.org/pubsubhubbub/ requesting a subscription to the Source's topic URI (channel) http://example.com/dataset1/change/ as a means to receive change notifications pertaining to dataset1 at its callback URI http://destination.example.net/callback/.

POST /pubsubhubbub/ HTTP/1.1
Host: http://hub.example.org
Content-Type: application/x-www-form-urlencoded
Content-Length: 141

hub.mode=subscribe&hub.topic=http%3A%2F%2FAexample.com%2Fdataset1%2Fchange%2F
&hub.callback=http%3A%2F%2Fdestination.example.net%2Fcallback%2F&hub.lease_seconds=3600

Example 5.2: A Destination's request to a hub to subscribe to a Source's notification channel

Example 5.3 shows the HTTP GET issued by the hub against the Destination's callback URI to verify that it was the Destination's intent to subscribe.

GET /callback/?hub.mode=subscribe&hub.topic=http%3A%2F%2FAexample.com%2Fdataset1%2Fchange%2F
&hub.challenge=c0cc4630-5116-11e3-8f96-0800200c9a66&hub.lease_seconds=2400 HTTP/1.1
Host: http://destination.example.net
Connection: Close

Example 5.3: A hub's request to verify a Destination's intent to subscribe

Example 5.4 shows the response by a Destination to the hub's subscription verification request of Example 5.3. It indicates that the Destination wants the subscription.

HTTP/1.1 200 OK
Date: Tue, 19 Nov 2013 12:42::13 GMT
Content-Type: text/plain; charset=UTF-8
Content-Length: 36
Connection: Close

c0cc4630-5116-11e3-8f96-0800200c9a66

Example 5.4: A hub's request to verify a Destination's intent to subscribe

5.3. Hub Delivers Notifications to Destination

When the hub receives a change notification or framework notification from the Source, it passes it on to the subscribing Destination(s). The process, shown as "Hub notifies Destination" in Figure 3 , is as follows:

Example 5.5 shows the HTTP POST that the hub issues against the Destination's callback URI to relay the notification it received from the Source in Example 5.1. For brevity, the payload is not shown in its entirety.

POST /callback/ HTTP/1.1
Host: http://destination.example.net
Content-Type: application/xml
Link: <http://example.com/dataset1/change/> ; rel="self",
 <http://hub.example.org/pubsubhubbub/> ; rel="hub", 
 <http://www.example.com/dataset1/capabilitylist.xml> ; rel="resourcesync"
Content-Length: 849

<?xml version="1.0" encoding="UTF-8"?>
<urlset ...

Example 5.5: The HTTP POST used by a hub to submit a Source's change notification payload to a Destination

5.4. Destination Unsubscribes from Hub

The mechanism by which a Destination unsubscribes from a Source's topic URI is as described in Section 5.1 but uses unsubscribe as the value of hub.mode instead of subscribe.

6. Advertising Notification Channels: PubSubHubbub

Notification capabilities are advertised via Capability Lists, as is the case with the capabilities defined in the core ResourceSync specification. As both the Change Notification channel and the Framework Notification channel are dedicated to a particular set of resources, they are advertised in the Capability List that corresponds with the set of resources.

Figure 3 displays a Change Notification channel and a Framework Notification channel advertised in a Capability List. The figure shows a structure with only one Capability List that advertises its designated notification channels. Other Capability Lists, each of which pertain to a different set of resources, would advertise their respective notification channels. The displayed Capability List could, in addition to the notification channels, advertise other capabilities such as a Resource List or a Change List as introduced in the core specification, and also advertise archive capabilities as introduced in the archiving specification.

Framework and Change Notification channel discovery

Figure 3: Framework and Change Notification channel discovery

Example 6.1 shows the Capability List from Example 7.1 of the core specification with discovery links for a Change Notification channel and a Framework Notification channel added. The PubSubHubbub topic URI is provided in the <loc> element, whereas the hub URI is provided using a <rs:ln> child element of <loc>. That <rs:ln> must have hub as the value of the rel attribute and the hub URI as the value of the href attribute. Note the introduction of the change-notification and framework-notification values for the capability attribute to indicate the Change Notification and Framework Notification capabilities, respectively.

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:rs="http://www.openarchives.org/rs/terms/">
   <rs:ln rel="describedby"
          href="http://example.com/info_about_set1_of_resources.xml"/>
   <rs:ln rel="up"
          href="http://example.com/source_description.xml"/>
   <rs:md capability="capabilitylist"/>
   <url>
       <loc>http://example.com/dataset1/resourcelist.xml</loc>
       <rs:md capability="resourcelist"/>
   </url>
   <url>
       <loc>http://example.com/dataset1/resourcedump.xml</loc>
       <rs:md capability="resourcedump"/>
   </url>
   <url>
       <loc>http://example.com/dataset1/changelist.xml</loc>
       <rs:md capability="changelist"/>
   </url>
   <url>
       <loc>http://example.com/dataset1/changedump.xml</loc>
       <rs:md capability="changedump"/>
   </url>
   <url>
      <loc>http://example.com/dataset1/change/</loc>
      <rs:ln rel="hub" href="http://hub.example.org/pubsubhubbub/"/>
      <rs:md capability="change-notification"/>
   </url>
   <url>
      <loc>http://example.com/dataset1/framework/</loc>
      <rs:ln rel="hub" href="http://hub.example.org/pubsubhubbub/"/>
      <rs:md capability="framework-notification"/>
   </url>
</urlset>

Example 6.1: A Capability List with entries to discover PubSubHubbub notification channels

7. References

[PubSubHubbub]
IETF PubSubHubbub Core, Fitzpatrick, B., Slatkin, B., Atkins, M., Genestoux, J., June 2013
[RFC 6455]
IETF RFC 6455: The WebSocket Protocol, I. Fette, A. Melnikov, December 2011.

A. Acknowledgements

This specification is the collaborative work of NISO and the Open Archives Initiative. Funding for ResourceSync is provided by the Alfred P. Sloan Foundation. UK participation is supported by Jisc.

The names of individual contributors will be listed here when the final specification is released.

B. Change Log

Date Editor Description
2014-03-24 graham, herbert version 0.9, removed ResourceSync-specific requirements from communication between Source and hub
2013-12-18 herbert, martin, rob, simeon version 0.8.1, using PubSubHubbub
2013-11-12 martin, herbert, rob, simeon version 0.8, using WebSockets

Creative Commons License
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.