[OAI-implementers] issues with OAI-PMH specifications for OAI-Provider implementations using a cache

Simeon Warner simeon.warner at cornell.edu
Tue Jun 2 09:29:22 EDT 2009


Hi Rozita,

The notion of including an explicit start-next-incremental-harvest-from date 
in the response is something I have thought about too. It would solve the 
cache problem you describe. Not sure how much support there would be for such 
a change, what do others think?

One way to solve this using the current protocol without modification is to 
use days granularity and to make sure that the cache is updated at least once 
within each day (and that the the update does not span a day boundary in UTC). 
That way T1=T2 always using your example.

If you opted to follow the 503 route then you could issue a second/multiple 
503's if the harvester comes back before the update is complete. This is 
really the only good approach if the cache is in an inconsistent state such 
that the idempotency requirements of the protocol are not met.

Cheers.
Simeon



Fridman, Rozita wrote:
> Hello all,
> 
> we developed an OAI-Provider for Escidoc repositories.
> Escidoc-OAI-Provider is based on the Fedora-OAI-Provider, which uses a
> cache to reduce a response time. Escidoc repositories intend to contain
> multiple millions of objects. The Escidoc-Core framework only requires
> that objects metadata stored in a Escidoc repository are well formed
> xml-structures. Therefore using of a cache in the Escidoc-OAI-Provider
> is essential to ensure validness of metadata in OAI-PMH response and an
> acceptable response time. 
> 
> But the current OAI-PMH protocol specification doesn't account for some
> issues, caused by the employment of a cache.
>  
> The main problem is a time lag between a harvester request and a last
> cache update:
> A harvester asks the OAI-Provider for all records that have changed
> between T0 and T2 in the underlying repository. The last cache update
> was at T1.The harvester gets records that have changed between T0 and
> T1, but assumes that it got all changes between T0 and T2. Therefore in
> the next request it asks for records that have changed between T2 and T3
> and is missing all changes between T1 and T2. If cache update interval
> is long and the next cache update takes place after T3, the harvester is
> also missing all changes between T2 and T3 and so on.
>    
> One proposal would be to put a date stamp of the last cache update into
> the OAI-PMH response, in order to inform a harvester about possibly
> missed records. 
> 
> Does anybody face the same problem? What do you think about it? Maybe
> there are better solutions for this problem?
> 
> The other issue is that depending on the OAI-Provider implementation a
> cache may be in an inconsistent state while a cache update process is
> running. Are there means in the OAI-PMH protocol to respond to harvester
> requests during a cache update? A possible solution would be to respond
> with a HTTP-status code 503-Service unavailable (section 3.1.2.2 of the
> specification), but the problem is to specify Retry-After period. A
> duration of the cache update is not constant, it depends on the changes
> in the repository.
> 
> Thanks a lot,
> Rozita
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------
> 
> 
> 
> -------------------------------------------------------
> 
> Fachinformationszentrum Karlsruhe, Gesellschaft für wissenschaftlich-technische Information mbH. 
> Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892. 
> Geschäftsführerin: Sabine Brünger-Weilandt. 
> Vorsitzender des Aufsichtsrats: MinR Hermann Riehl.
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://www.openarchives.org/mailman/listinfo/oai-implementers
> 




More information about the OAI-implementers mailing list