FW: [OAI-implementers] Open Archives Initiative Protocol for Meta data Harvesting Version 2 news

Walter Underwood wunder@inktomi.com
Wed, 06 Feb 2002 13:42:55 -0800


--On Wednesday, February 6, 2002 2:18 PM -0500 Simeon Warner <simeon@cs.cornell.edu> wrote:
>
> My main objection to including an option for harvesters to specify the
> maximum number of records they wish to get in a reply is that this will
> force ALL repositories to implement resumptionTokens.  Currently, small
> repsotiories (say a few thousand records) can happily ignore that part of
> the spec.

I suggest getting rid of resumption tokens to make it
simpler for all sizes of repositories.

A very simple server can always calculate the entire result list,
then send the portion requested, for example, records 21-30.
Cache the result list to speed things up.

Internally, this is much easier to implement than resumption
tokens. Caching is independent of the correctness of the list,
so the two are loosely coupled. For simple databases, the slow
part is getting data from disk, and the existing OS file cache
will already provided the most important level of caching.

Large systems will probably use commercial databases, which
provide additional levels of caching.

A repository with only a few thousand records could load them
into memory at startup and reboot when there is a change.
1K per record, 10K records is only 10Meg. No caching needed.

wunder
--
Walter R. Underwood
Senior Staff Engineer
Inktomi Enterprise Search
http://search.inktomi.com/