[OAI-general] Open Archives Initiative Protocol for Metadata Harvesting Version 2 news

Carl Lagoze lagoze@cs.cornell.edu
Mon, 4 Feb 2002 12:04:48 -0500


Dear OAI community: 

In mid-2001 the Open Archives Initiative Technical Committee (OAI-TC) was
formed to develop and write version 2 of the Open Archives Protocol for
Metadata Harvesting (OAI-PMH).  In this email, we would like to inform you about:

* The context of this technical work;
* The process for undertaking the work;
* The schedule for the release of v.2.0 of the OAI-PMH;
* Anticipated changes in v.2.0 of the OAI-PMH.

Carl Lagoze and Herbert Van de Sompel

=> The context of this technical work was:

1. The original release of the OAI-PMH, version 1.x, was intended to
initiate a year long period of experimentation with the protocol.  The goal
was to make this experimental version as stable as possible to encourage
usage and testing. (In fact, only one change from version 1.0 to 1.1 was made
during the year in response to a W3C change in the XML schema
specification).

2. The OAI-TC work should avoid if possible the addition of significant
functionality to the protocol. Instead, the scope of work should be to
resolve problems that arose over the past year in reaction to experience in
the user community.

3. While it was not deemed necessary that version 2.0 be backward compatible
with version 1.x, the upgrade path when version 2 is release should be
reasonably straightforward.

4. The result of the work, version 2, should be a stable, "standard"
release.  It remains undecided as to whether a formal standardization
process will be undertaken with the version 2 protocol.


 => The process for undertaking this work has been: 

1. Formation of the OAI-TC representing technical expertise from a
cross-section of the OAI community.  Conduct of this work within a closed
technical committee follows the same procedure which was successfully used
for the development of OAI-PMH v. 1.x.  Members of OAI-TC are listed at
http://www.openarchives.org/organization/tech.comm.html.

2. Joint identification of issues 

3. Development of issue white papers 

4. Vetting of white papers to determine those that were in scope of OAI-TC
work 

5. Development of issue resolution 

6. On-line and phone meetings to reach final issue resolution 

7. Reporting and validation of the results of the work of OAI-TC to the OAI Steering Committee.
Members of OAI-SC are listed at
http://www.openarchives.org/news/oaiscpress000825.html

8. Protocol revision and writing 
      

=> The schedule for the release of v 2.0 of the protocol is as follows: 

1. March 1: release of the protocol to a limited group of alpha testers 

2. April 1: beta public release 

3. May 1: final public release 


=> The following is a summary of the changes that are anticipated for
version 2 of OAI-PMH: 

1. Dates and times - Standardize on UTC for all dates and times in protocol
requests ("from" and "until" arguments) and responses.
        
 2. Harvesting Granularity- Allow all ISO8601 time granularities in dates
and times in the "from" and "until" arguments of protocol requests.  Allow a
data provider to expose its support date/time granularity in the response to
an Identity request.  Default granularity is YYY-MM-DD.

3. Flow control - Improve flow control by allowing the following optional
attributes when a resumptionToken is returned:    
* retryAfter - a suggested wait time until the request should be resubmitted

* expirationDate - the projected expiration of the resumptionToken  
* completeListSize - total number of items across entire result set 
* cursor - index of first item in this batch within entire result set 

4. set functionality - It will be possible to specify an identifier as
argument to the ListSets verb, permitting a harvester to inquire to which
sets an item belongs.  Responses to ListRecords and GetRecord will return
the sets to which each item belongs. Support of sets remains optional.

5. base-URL - Insulate harvesters from proxy servers by mandating that the
visible identity of the "handling server" in responses be that of a
persistent "master", that may opaquely reflect requests to slaves. 

6. xml schema for mandatory Dublin Core - Coordinate with the DCMI so that
the schema used by the OAI is based on one managed by DCMI.  Must allow
inclusion of the xml lang attribute (specifying the language of the metadata
value). 
        
7. Dedupping - Define an optional "provenance" XML container that can be
attached to metadata records that a data provider aggregates from other
sources.  This will help harvesters in detecting duplicates harvested from
multiple data providers. 

8. Error handling - Report OAI errors in OAI responses in a manner
independent of HTTP status codes. 

9. Set description - Define an optional XML container with which communities
can describe individual sets. 

10. Multiple metadata formats - Modify ListIdentifiers to permit a metadata
format as argument, filtering the return to include only record identifiers
that support the specified format.