[OAI-implementers] Open Archives Initiative Protocol for Metadata Harvesting Version 2 news

Walter Underwood wunder@inktomi.com
Mon, 04 Feb 2002 14:11:18 -0800

--On Monday, February 4, 2002 12:04 PM -0500 Carl Lagoze <lagoze@cs.cornell.edu> wrote:
> 1. Dates and times - Standardize on UTC for all dates and times in protocol
> requests ("from" and "until" arguments) and responses.

Excellent idea.

>  2. Harvesting Granularity- Allow all ISO8601 time granularities in dates
> and times in the "from" and "until" arguments of protocol requests.  Allow a
> data provider to expose its support date/time granularity in the response to
> an Identity request.  Default granularity is YYY-MM-DD.

Poor idea. This multiplies the test cases massively and makes the
protocol implementation more complex. Now the spec must define what
"until 2002-02-02" means (beginning or end of the day?), and it must
be tested for all granularities in all arguments, and all combinations.
Some clients don't even need to send day granularities. Why should
they be saddled with this complexity?

> 3. Flow control - Improve flow control by allowing the following optional
> attributes when a resumptionToken is returned:
> * retryAfter - a suggested wait time until the request should be resubmitted
> * expirationDate - the projected expiration of the resumptionToken
> * completeListSize - total number of items across entire result set
> * cursor - index of first item in this batch within entire result set

This is surprisingly close to my suggested list approach. If this was
changed to have the client to send the cursor and the number of records
requested, then the resumption token is no longer needed. Think of it as
moving the database cursor from the server the client. Offloading the
state. This approach is  proven in high-load applications like LDAP and
HTTP search engines.

The 2.0 approach does have the problem where a correctly implemented
server can crash a correctly implemented client. That is very bad.
To fix that, the client must be able to specify the desired number
of records.

completeListSize is an excellent thing to return. It is somewhat
simpler to return that every time, rather than optionally as part
of the resumption token.

Walter R. Underwood
Senior Staff Engineer
Inktomi Enterprise Search