[OAI-implementers] issues with OAI-PMH specifications for OAI-Provider implementations using a cache

Michael Nelson mln at cs.odu.edu
Tue Jun 2 11:21:39 EDT 2009

On Tue, 2 Jun 2009, Hussein Suleman wrote:

> hi Rozita
> if you use a purpose-built cache, hopefully you can update the datestamp in 
> the cache so the datestamps of the cache are used to answer queries instead 
> of the original datestamps ... if you do this, you will not have a problem, 
> and i do believe this is the recommend OAI-PMH usage for 
> hierarchical/intermediate systems (i am sure it is written down somewhere but 
> i cant recall where)

I just realized that my response was essentially the same as Hussein's 
here -- I should have sent my mesg in reply & support of this one.



> then, regarding cache downtime, i was going to say what Simeon has just 
> written regarding using multiple 503s ...
> (a day granularity may be restrictive, but it does depend on specifics of 
> your application)
> regarding the metadata issue, the reason for the requirement is so that 
> metadata records are self-contained and can be stored, verified and moved 
> around without losing namespace information. this requirement exists to some 
> degree because OAI-PMH was designed in the early and somewhat "wild-west" 
> days of XML when XML parsers were not very namespace-aware ... although i 
> should add that even today if you programmatically extract an XML sub-tree 
> with many parsing tools, you will still not have have fully validifiable 
> (valid?) XML unless namespace information is in the inner tags ... so it is 
> all about maintaining verification information within records come what may 
> ...
> ttfn,
> ----hussein
> =====================================================================
> hussein suleman ~ hussein at cs.uct.ac.za ~ http://www.husseinsspace.com
> =====================================================================
> Fridman, Rozita wrote:
>> Hello all,
>> we developed an OAI-Provider for Escidoc repositories.
>> Escidoc-OAI-Provider is based on the Fedora-OAI-Provider, which uses a
>> cache to reduce a response time. Escidoc repositories intend to contain
>> multiple millions of objects. The Escidoc-Core framework only requires
>> that objects metadata stored in a Escidoc repository are well formed
>> xml-structures. Therefore using of a cache in the Escidoc-OAI-Provider
>> is essential to ensure validness of metadata in OAI-PMH response and an
>> acceptable response time. 
>> But the current OAI-PMH protocol specification doesn't account for some
>> issues, caused by the employment of a cache.
>>  The main problem is a time lag between a harvester request and a last
>> cache update:
>> A harvester asks the OAI-Provider for all records that have changed
>> between T0 and T2 in the underlying repository. The last cache update
>> was at T1.The harvester gets records that have changed between T0 and
>> T1, but assumes that it got all changes between T0 and T2. Therefore in
>> the next request it asks for records that have changed between T2 and T3
>> and is missing all changes between T1 and T2. If cache update interval
>> is long and the next cache update takes place after T3, the harvester is
>> also missing all changes between T2 and T3 and so on.
>>    One proposal would be to put a date stamp of the last cache update into
>> the OAI-PMH response, in order to inform a harvester about possibly
>> missed records. 
>> Does anybody face the same problem? What do you think about it? Maybe
>> there are better solutions for this problem?
>> The other issue is that depending on the OAI-Provider implementation a
>> cache may be in an inconsistent state while a cache update process is
>> running. Are there means in the OAI-PMH protocol to respond to harvester
>> requests during a cache update? A possible solution would be to respond
>> with a HTTP-status code 503-Service unavailable (section of the
>> specification), but the problem is to specify Retry-After period. A
>> duration of the cache update is not constant, it depends on the changes
>> in the repository.
>> Thanks a lot,
>> Rozita
>> ------------------------------------------------------------------------
>> -------------------------------------------------------
>> Fachinformationszentrum Karlsruhe, Gesellschaft für 
>> wissenschaftlich-technische Information mbH. Sitz der Gesellschaft: 
>> Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892. 
>> Geschäftsführerin: Sabine Brünger-Weilandt. Vorsitzender des Aufsichtsrats: 
>> MinR Hermann Riehl.
>> ------------------------------------------------------------------------
>> _______________________________________________
>> OAI-implementers mailing list
>> List information, archives, preferences and to unsubscribe:
>> http://www.openarchives.org/mailman/listinfo/oai-implementers
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://www.openarchives.org/mailman/listinfo/oai-implementers

Michael L. Nelson mln at cs.odu.edu http://www.cs.odu.edu/~mln/
Dept of Computer Science, Old Dominion University, Norfolk VA 23529
+1 757 683 6393 +1 757 683 4900 (f)

More information about the OAI-implementers mailing list