[OAI-implementers] issues with OAI-PMH specifications for OAI-Provider implementations using a cache

Tue Jun 2 10:19:42 EDT 2009

Hi,

I am very new to OAI, so I am limited help, and more looking to learn myself.

The burden might be on the harvester.  The Implementation Guidelines for harvesters appears to address this in section 3, stating that harvesters should overlap requests.  And it would seem a good implementation should base the "From" in subsequent requests based on previous harvesting timestamps found in the records themselves - not their own arbitrarily chosen "until" value.

The later problem you mention seems to be a problem that is solved by mirrors / load balancing.  With a mirror, you essentially have 2 copies of the site, and use a 302 HTTP codes to stop requests to the site you are updating and redirect to the other copy.   With a load balancer this site switching can be done invisibly to the harvester.

> Date: Tue, 2 Jun 2009 13:37:01 +0200
> From: Rozita.Fridman at FIZ-Karlsruhe.DE
> To: oai-implementers at openarchives.org
> Subject: [OAI-implementers] issues with OAI-PMH specifications for	OAI-Provider implementations using a cache
> 
> Hello all,
> 
> we developed an OAI-Provider for Escidoc repositories.
> Escidoc-OAI-Provider is based on the Fedora-OAI-Provider, which uses a
> cache to reduce a response time. Escidoc repositories intend to contain
> multiple millions of objects. The Escidoc-Core framework only requires
> that objects metadata stored in a Escidoc repository are well formed
> xml-structures. Therefore using of a cache in the Escidoc-OAI-Provider
> is essential to ensure validness of metadata in OAI-PMH response and an
> acceptable response time. 
> 
> But the current OAI-PMH protocol specification doesn't account for some
> issues, caused by the employment of a cache.
>  
> The main problem is a time lag between a harvester request and a last
> cache update:
> A harvester asks the OAI-Provider for all records that have changed
> between T0 and T2 in the underlying repository. The last cache update
> was at T1.The harvester gets records that have changed between T0 and
> T1, but assumes that it got all changes between T0 and T2. Therefore in
> the next request it asks for records that have changed between T2 and T3
> and is missing all changes between T1 and T2. If cache update interval
> is long and the next cache update takes place after T3, the harvester is
> also missing all changes between T2 and T3 and so on.
>    
> One proposal would be to put a date stamp of the last cache update into
> the OAI-PMH response, in order to inform a harvester about possibly
> missed records. 
> 
> Does anybody face the same problem? What do you think about it? Maybe
> there are better solutions for this problem?
> 
> The other issue is that depending on the OAI-Provider implementation a
> cache may be in an inconsistent state while a cache update process is
> running. Are there means in the OAI-PMH protocol to respond to harvester
> requests during a cache update? A possible solution would be to respond
> with a HTTP-status code 503-Service unavailable (section 3.1.2.2 of the
> specification), but the problem is to specify Retry-After period. A
> duration of the cache update is not constant, it depends on the changes
> in the repository.
> 
> Thanks a lot,
> Rozita
> 
> 
> 

_________________________________________________________________
Insert movie times and more without leaving Hotmail®. 
http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=TXT_TAGLM_WL_HM_Tutorial_QuickAdd_062009
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.openarchives.org/pipermail/oai-implementers/attachments/20090602/cd0ac57f/attachment-0001.htm