FW: [OAI-implementers] Open Archives Initiative Protocol for Meta data Harvesting Version 2 news

Young,Jeff jyoung@oclc.org
Wed, 6 Feb 2002 13:59:55 -0500


> From: Walter Underwood [mailto:wunder@inktomi.com]
> Sent: Wednesday, February 06, 2002 12:10 PM
> The list interfaces are mostly needed for new items. We don't mind
> if the list is inconsistant or unsynchronized, as long as it has
> all the new stuff.

I guess I'm saying that resumptionTokens don't necessarily guarantee you'll
get "all the new stuff", but could if appropriately implemented. The
stateless alternative, though, seems to assume an idealistically static
repository. If records are deleted from the repository, a stateless
harvesting solution doesn't seem to allow for the possibility of getting all
the new stuff. 

Imagine a resultset with 1 million records served in 1000 record chunks.
During the course of the harvest 10 records get deleted from the repository.
Since the stateless solution relies on the position of a cursor, the
client's view of the cursor may be as many as 10 records beyond the server's
view and thus records will be missed. Using resumptionTokens, however, I can
maintain a consistent cursor between client and server.

Also, I'd like to use OAI for internal operations within our organization.
Under those circumstances, I can make assumptions about the OAI
server/harvester such as records will never vanish and instead will be
flagged as deleted. With millions records in our repository, I'd like to
avoid a complete reharvest wherever possible. I don't believe reharvests can
be avoided using stateless harvesting the way they can with stateful
harvesting.

Jeff