[OAI-implementers] issues with OAI-PMH specifications for OAI-Provider implementations using a cache

Fridman, Rozita Rozita.Fridman at FIZ-Karlsruhe.DE
Tue Jun 2 11:27:40 EDT 2009


Hi Hussein,

thanks for your response.

> if you use a purpose-built cache, hopefully you can update the
> datestamp
> in the cache so the datestamps of the cache are used to answer queries
> instead of the original datestamps ... if you do this, you will not
> have
> a problem, and i do believe this is the recommend OAI-PMH usage for
> hierarchical/intermediate systems (i am sure it is written down
> somewhere but i cant recall where)
> 

 I found the Guidelines for Aggregators, Caches and Proxies (http://www.openarchives.org/OAI/2.0/guidelines-aggregator.htm). It requires to change the original repository date stamps to the date stamps of harvesting. But as I understood it is about Aggregators that harvest itself data from other repositories. The Escidoc OAI-Provider intend to behave like an immediate OAI-provider of the Escidoc repository, not like an intermediate node. 


> then, regarding cache downtime, i was going to say what Simeon has just
> written regarding using multiple 503s ...
> 
> (a day granularity may be restrictive, but it does depend on specifics
> of your application)
> 
> regarding the metadata issue, the reason for the requirement is so that
> metadata records are self-contained and can be stored, verified and
> moved around without losing namespace information. this requirement
> exists to some degree because OAI-PMH was designed in the early and
> somewhat "wild-west" days of XML when XML parsers were not very
> namespace-aware ... although i should add that even today if you
> programmatically extract an XML sub-tree with many parsing tools, you
> will still not have have fully validifiable (valid?) XML unless
> namespace information is in the inner tags ... so it is all about
> maintaining verification information within records come what may ...

My question is why the attribute "xmlns:xsi" is required to be in the metadata part.
Of course a name space uri of meta data itself must be in the meta data part.

Best regards,
Rozita
> 
> ttfn,
> ----hussein
> 
> =====================================================================
> hussein suleman ~ hussein at cs.uct.ac.za ~ http://www.husseinsspace.com
> =====================================================================
> 
> 
> Fridman, Rozita wrote:
> > Hello all,
> >
> > we developed an OAI-Provider for Escidoc repositories.
> > Escidoc-OAI-Provider is based on the Fedora-OAI-Provider, which uses
> a
> > cache to reduce a response time. Escidoc repositories intend to
> contain
> > multiple millions of objects. The Escidoc-Core framework only
> requires
> > that objects metadata stored in a Escidoc repository are well formed
> > xml-structures. Therefore using of a cache in the Escidoc-OAI-
> Provider
> > is essential to ensure validness of metadata in OAI-PMH response and
> an
> > acceptable response time.
> >
> > But the current OAI-PMH protocol specification doesn't account for
> some
> > issues, caused by the employment of a cache.
> >
> > The main problem is a time lag between a harvester request and a last
> > cache update:
> > A harvester asks the OAI-Provider for all records that have changed
> > between T0 and T2 in the underlying repository. The last cache update
> > was at T1.The harvester gets records that have changed between T0 and
> > T1, but assumes that it got all changes between T0 and T2. Therefore
> in
> > the next request it asks for records that have changed between T2 and
> T3
> > and is missing all changes between T1 and T2. If cache update
> interval
> > is long and the next cache update takes place after T3, the harvester
> is
> > also missing all changes between T2 and T3 and so on.
> >
> > One proposal would be to put a date stamp of the last cache update
> into
> > the OAI-PMH response, in order to inform a harvester about possibly
> > missed records.
> >
> > Does anybody face the same problem? What do you think about it? Maybe
> > there are better solutions for this problem?
> >
> > The other issue is that depending on the OAI-Provider implementation
> a
> > cache may be in an inconsistent state while a cache update process is
> > running. Are there means in the OAI-PMH protocol to respond to
> harvester
> > requests during a cache update? A possible solution would be to
> respond
> > with a HTTP-status code 503-Service unavailable (section 3.1.2.2 of
> the
> > specification), but the problem is to specify Retry-After period. A
> > duration of the cache update is not constant, it depends on the
> changes
> > in the repository.
> >
> > Thanks a lot,
> > Rozita
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> ---
> >
> >
> >
> > -------------------------------------------------------
> >
> > Fachinformationszentrum Karlsruhe, Gesellschaft für wissenschaftlich-
> technische Information mbH.
> > Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim
> HRB 101892.
> > Geschäftsführerin: Sabine Brünger-Weilandt.
> > Vorsitzender des Aufsichtsrats: MinR Hermann Riehl.
> >
> >
> >
> > ---------------------------------------------------------------------
> ---
> >
> > _______________________________________________
> > OAI-implementers mailing list
> > List information, archives, preferences and to unsubscribe:
> > http://www.openarchives.org/mailman/listinfo/oai-implementers
> >


-------------------------------------------------------

Fachinformationszentrum Karlsruhe, Gesellschaft für wissenschaftlich-technische Information mbH. 
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 101892. 
Geschäftsführerin: Sabine Brünger-Weilandt. 
Vorsitzender des Aufsichtsrats: MinR Hermann Riehl.





More information about the OAI-implementers mailing list