Open Archives Initiative ResourceSync Framework Specification |
The ResourceSync core specification introduces a pull-based synchronization framework for the web that consists of various capabilities that a Source can implement to allow Destinations to remain synchronized with its evolving resources. This ResourceSync notification specification describes two additional, push-based, capabilities that a Source can support. Both are aimed at reducing synchronization latency and entail a Source sending notifications to subscribing Destinations.
This specification is one of several documents comprising the ResourceSync Framework Specifications.
This specification is an alpha draft released for public comment. PubSubHubbub will be used as the push-technology for initial experiments instead of WebSockets as described here. Feedback is most welcome on the ResourceSync Google Group.
1. Introduction
1.1 Motivating Examples
1.2 Notational Conventions
2. Notification Types and Channels
2.1 Notification Change Types
3. Advertising Notification Channels
4. Change Notification
5. Framework Notification
6. Websockets
7. References
A. Acknowledgements
B. Change Log
This specification describes notification capabilities defined for the ResourceSync framework. The push-based notification capabilities are aimed at decreasing the synchronization latency between a Source and a Destination that is inherent in the pull-based capabilities defined in the ResourceSync core specification. Two notification capabilities are specified and both entail a Source sending notifications to subscribing Destinations. The Change Notification capability consists of a Source sending notifications about changes to its resources. The Framework Notification capability consists of a Source sending out notifications about changes to its implementation of the ResourceSync framework, for example the publication of a new Resource List or the updating of a Change List.
Applications based on Linked Data integrate resources from various datasets, with resources likely changing at a different pace. The BBC Linked Data applications that integrate data from, among others, Last.FM, DBpedia, MusicBrainz, and GeoNames serve as examples. The accuracy of services based on such an integrated resource collection depends on the contributing resources being up-to-date. The update frequency of LiveDBPedia resources, for example, has been observed to average around two changes per second. This provides a significant synchronization challenge that the Change Notification capability aims to address.
While the pull-based capabilities specified in the ResourceSync core specification allow Destinations to remain informed about the evolving state of a Source's resources, they do leave the question open as to when a Destination should check whether, for example, a Source has published a new Resource List or has updated a Change List. A pragmatic solution is for Destinations to recurrently poll a Source at a frequency that is based on experience with the pace of prior updates. The Framework Notification capability is about informing Destinations about changes to a Source's ResourceSync environment, thereby providing an explicit trigger to poll a Source, and in doing so removing uncertainty and optimizing the synchronization process.
This specification uses the terms "resource", "representation", "request", "response", "content negotiation", "client", and "server" as described in [Architecture of the World Wide Web].
Throughout this document, the following namespace prefix bindings are used:
Prefix | Namespace URI | Description |
---|---|---|
http://www.sitemaps.org/schemas/sitemap/0.9 |
Sitemap XML elements defined in the Sitemap protocol | |
rs | http://www.openarchives.org/rs/terms/ |
Namespace for elements and attributes introduced in this specification |
Notifications are applied at two distinct levels in the ResourceSync framework:
Notifications are sent from Source to Destination using a push technology. Change Notifications and Framework Notifications are sent on different notification channels. The push technology used by this specification is WebSockets [RFC 6455] (see Section 6 for details). WebSockets allow distinct channels to be defined using a different socket for each channel.
Figure 1 displays the structure of the ResourceSync framework for a Source that has a single set of resources, showing the Source Description and the Capability List at the top. The Capability List advertises four distinct capabilities: a Resource List, a Change List, a Resource Dump, and a Change Dump. The figure also shows a Framework Notification channel (red hexagon) and a Change Notification channel (yellow hexagon) and indicates the levels of the framework they apply to:
index
pointing at that index. This allows Destinations to navigate towards the index and detect further changes there.
For example, the framework notification about the creation of a new Resource List must contain an index
link pointing at the
Resource List Index.The ResourceSync framework allows a Source to offer multiple sets of resources in which case the Source Description points to multiple Capability Lists, one for each set of resources. This scenario has the following implications for the notification capabilities:
Capability List 1
and Capability List 2
.
Figure 2 shows that each set of resources has a designated Change Notification and
Framework Notification channel.
Change Notification channel 1
, for example, is used to send change notifications about changes to resources that are part
of the set of resources covered by Capability List 1
.
In contrast, Framework Notification channel 2
is used to send notifications about changes to the capability documents advertised by
Capability List 2
and about changes to Capability List 2
itself.
Notifications about changes to the Source Description are sent via Framework Notification channel 1
and
Framework Notification channel 2
.
The following table provides an overview of the possible change types that Change Notifications and Framework Notifications inform about within the ResourceSync framework.
Capability | Change Type | ||
---|---|---|---|
Create | Update | Delete | |
Change Notification | |||
Individual Resource | X | X | X |
Framework Notification | |||
Resource List | X | ||
Resource Dump | X | ||
Change List | X | X | |
Change Dump | X | X | |
Capability List | X | X | X |
Source Description | X | X | X |
Note that the creation and deletion of Change Notification channels and Framework Notification channels is reflected in updated Capability Lists (see section 3). This specification does not define a notification about notification channels.
Notification capabilities are advertised via Capability Lists, as is the case with the capabilities defined in the core ResourceSync specification. As both the Change Notification channel and the Framework Notification channel are dedicated to a particular set of resources, they are advertised in the Capability List that corresponds with the set of resources.
Figure 3 displays a Change Notification channel and a Framework Notification channel advertised in a Capability List. The figure shows a structure with only one Capability List that advertises its designated notification channels. Other Capability Lists, each of which pertain to a different set of resources, would advertise their respective notification channels. The displayed Capability List could, in addition to the notification channels, advertise other capabilities such as a Resource List or a Change List as introduced in the core specification, and also advertise archive capabilities as introduced in the archiving specification.
Example 3.1 shows the Capability List from
Example 7.1 of the core specification with discovery links for a
Change Notification channel and a Framework Notification channel added.
The URIs identifying WebSockets channels are provided in the <loc>
elements. Note the introduction of
the change-notification
and framework-notification
values for the capability
attribute to
indicate the Change Notification and Framework Notification capabilities, respectively.
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:ln rel="describedby" href="http://example.com/info_about_set1_of_resources.xml"/> <rs:ln rel="up" href="http://example.com/source_description.xml"/> <rs:md capability="capabilitylist"/> <url> <loc>http://example.com/dataset1/resourcelist.xml</loc> <rs:md capability="resourcelist"/> </url> <url> <loc>http://example.com/dataset1/resourcedump.xml</loc> <rs:md capability="resourcedump"/> </url> <url> <loc>http://example.com/dataset1/changelist.xml</loc> <rs:md capability="changelist"/> </url> <url> <loc>http://example.com/dataset1/changedump.xml</loc> <rs:md capability="changedump"/> </url> <url> <loc>ws://example.com/channels/dataset1/change_notification_channel</loc> <rs:md capability="change-notification"/> </url> <url> <loc>ws://example.com/channels/dataset1/framework_notification_channel</loc> <rs:md capability="framework-notification"/> </url> </urlset>
A change notification is sent on the appropriate Change Notification channel, as described in Section 2, if a Source wishes to notify a Destination that one or more of its resources subject to synchronization have changed. By subscribing to a Change Notification channel, a Destination can reduce synchronization latency and avoid periodically polling the Source's Change Lists to determine whether resource changes have occurred.
The format of a change notification is very similar to the Change List format introduced in
Section 10 of the core specification.
It is based on the <urlset>
document format introduced by the Sitemap protocol.
It has the <urlset>
root element and the following structure:
<rs:ln>
child element of <urlset>
points to the Capability List with the relation type
up
.<url>
child element of <urlset>
per resource change. This element does not have attributes, but uses
child elements to convey information about the changed resource. The <url>
element has the following child elements:
<loc>
child element provides the URI of the changed resource.<lastmod>
child element with semantics as described in
Section 5 of the core specification.
All entries in a change notification must be provided in forward chronological order: the least recently changed
resource must be listed at the beginning of the change notification payload, while the most recently changed resource must be
listed at the end.<rs:md>
child element must have the attribute change
to convey the nature of the change.
Its value can be created
, updated
, or deleted
.
It can further have attributes hash
, length
, and type
, as described in Section 5 of the core specification.<rs:ln>
child elements link to related resources as described in
Section 5 and
Section 12 of the core specification.
Change notifications do not use the <sitemapindex>
document format
introduced by the Sitemap protocol. In the event that there are a very large number
of simultaneous changes at a Source, the notifications must be split into a sequence
of change notifications using
Example 4.1 shows the payload of a change notification containing the description of changes to two resources.
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:ln rel="up" href="http://example.com/dataset1/capabilitylist.xml"/> <url> <loc>http://example.com/res1</loc> <lastmod>2013-01-02T13:03:00Z</lastmod> <rs:md change="updated" hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type="text/html"/> </url> <url> <loc>http://example.com/res2</loc> <lastmod>2013-01-02T13:23:00Z</lastmod> <rs:md change="updated" hash="md5:1e0d5cb8ef6ba40c99b14c0237be735e sha-256:854f61290e2e197a11bc91063afce22e43f8ccc655237050ace766adc68dc784" length="14599" type="application/pdf"/> </url> </urlset>
A framework notification is sent on the appropriate Framework Notification channel, as described in Section 2, if a Source wishes to notify a Destination about changes to Resource Lists, Change Lists, Resource Dumps, Change Dumps, Capability Lists, and Source Descriptions. By subscribing to a Framework Notification channel, Destinations can refrain from periodically pulling these documents to determine whether they changed.
The format of a framework notification is very similar to the Change List format introduced in
Section 10 of the core specification.
It is based on the <urlset>
document format introduced by the Sitemap protocol.
It has the <urlset>
root element and the following structure:
<rs:ln>
child element of <urlset>
with the relation type
up
is mandatory when the change notification pertains to a capability document or a Capability List.<url>
child element of <urlset>
per framework notification.
This element does not have attributes, but uses child elements to convey information about the change to the framework.
The <url>
element has the following child elements:
<loc>
child element provides the URI of the changed capability document, Capability List or Source Description.<rs:md>
child element that must have two attributes. The first is the change
attribute, used
to convey the nature of the change. Possible values are created
, updated
, or deleted
and their use is as shown in
Table 2.1.
The second attribute is capability
, used to indicate the component of the framework that has undergone the change.
Possible values are changelist
, resourcelist
, changedump
, resourcedump
,
capabilitylist
, and description
. For notifications about changes to archival capabilities,
the values for the capability
attribute defined in the Archive specification are used.<rs:ln>
child element with the relation type index
that points to an index document in case the document
that has undergone a change resides under one.
Framework notifications do not use the <sitemapindex>
document format
introduced by the Sitemap protocol.
Example 5.1 shows the payload of a framework notification informing the Destination about the availability of a new Resource List.
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:ln rel="up" href="http://example.com/dataset1/capabilitylist.xml"/> <url> <loc>http://example.com/resourceset1/resourcelist.xml</loc> <rs:md change="created" capability="resourcelist"/> </url> </urlset>
As shown in Figure 1 and Figure 2,
framework notifications are never sent at the index level.
If the Source sends a framework notification about the change to a document (e.g., a Resource List)
that resides under an index, it must provide a <rs:ln>
child element to the
<url>
element in which that change is communicated. The relation type of that link must be
index
, and the target of the link must be the index (e.g., the Resource
List Index) that the changed document resides under.
It is likely that framework notifications only contain information about a single change to the framework.
However, multiple such changes can be aggregated into a single framework notification.
Example 5.2 shows the payload of a framework notification informing the Destination about
a new Resource List, a new Resource Dump, and about an updated Change List. The Resource List
resides under an index and hence the corresponding <url>
element has a <rs:ln>
child
element with the relation type index
. Note that the framework notification only contains one entry for
one new Resource List that resides under an index even though the index likely points to other new Resource Lists.
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:rs="http://www.openarchives.org/rs/terms/">
<rs:ln rel="up"
href="http://example.com/dataset1/capabilitylist.xml"/>
<url>
<loc>http://example.com/dataset1/resourcelist.xml</loc>
<rs:md change="created"
capability="resourcelist"/>
<rs:ln rel="index"
href="http://example.com/dataset1/resourcelist-index.xml"/>
</url>
<url>
<loc>http://example.com/dataset1/resourcedump.xml</loc>
<rs:md change="created"
capability="resourcedump"/>
</url>
<url>
<loc>http://example.com/dataset1/changelist.xml</loc>
<rs:md change="updated"
capability="changelist"/>
</url>
</urlset>
The WebSocket protocol is a bi-directional channel over a single TCP connection, and integrated with the web. It is standardized as RFC 6455, and (at the time of writing) the WebSocket API is being standardized in the W3C. It is supported by all of the major browsers, many libraries and implementations exist for both server and client side development, and debugging interfaces are built into the Chrome browser's developer tools for the transactions sent between systems. The communication can be done, for example over port 80 or 443, which makes it easier to support in environments with firewalls at either the Source or Destination, and web proxy support is available.
Each websocket has a URI, with the scheme ws://
for basic websockets and wss://
for encrypted websockets (the
equivalent of http://
and https://
). On requesting a connection to this URI, the client will negotiate with the
server to set up the communication channel. There is then minimal overhead for wrappers that surround the messages that pass back and forth
over the websocket.
The implementation of this notification specification using the WebSocket protocol is straightforward. The messages, as described above, will only be passed from the server (Source) to the client (Destination). No messages are defined in this specification for client to server communication over the websocket. The notification channels are advertised in the Capability List as described in Section 3. Destinations should connect to the websocket as advertised, listen for messages and process them as if they had retrieved them from the pull-based capabilities defined in the ResourceSync Core Specification.
A minimal, trivial implementation in Python, using Tornado is shown in Example 6.1.
import tornado.ioloop from tornado.web import Application from tornado.websocket import WebSocketHandler from resync import ChangeList, Resource class MyApplication(Application): def __init__(self): handlers = [(r"/channels/dataset1/change_notification_channel", ResyncSocketHandler)] Application.__init__(self, handlers, {}) class ResyncSocketHandler(WebSocketHandler): destinations = set() def open(self): ResyncSocketHandler.waiters.add(self) def on_close(self): ResyncSocketHandler.waiters.remove(self) @classmethod def send_updates(cls, msg, frm=None): for dest in cls.destinations: if dest != frm: try: dest.write_message(msg) except: pass # logging goes here def sendEventsCallback(): newEvents = get_new_resourcesync_events() # Implement me if newEvents: cl = ChangeList() for event in newEvents: cl.add(Resource(event.url, change=event.typ, lastmod=event.time)) ResyncSocketHandler.send_updates(cl.as_xml()) app = MyApplication() app.listen(80) loop = tornado.ioloop.IOLoop.instance() notifier = tornado.ioloop.PeriodicCallback(sendEventsCallback, 5000, io_loop=loop) notifier.start() # check for events every 5 seconds loop.start()
This specification is the collaborative work of NISO and the Open Archives Initiative. Funding for ResourceSync is provided by the Alfred P. Sloan Foundation. UK participation is supported by Jisc.
The names of individual contributors will be listed here when the final specification is released.
Date | Editor | Description |
---|---|---|
2013-11-22 | martin, herbert, rob, simeon | added note that PubSubHubbub will be used for initial experiments, not WebSockets |
2013-11-12 | martin, herbert, rob, simeon | version 0.8 |
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.