[OAI-implementers] on resource harvesting & datestamps

Fabio Simeoni Fabio.Simeoni at cis.strath.ac.uk
Thu Mar 3 13:38:33 EST 2005


Hi,
apologies if this has been brought up before.

I have just read "Resource Harvesting within the OAI-PMH Framework" (Van de
Sompel, Nelson, Lagoze, Warner), where issues of ambiguity in locating
resources as well as changes to resources have been related to the
inadequacy of simple metadata formats (primarily Dublin Core) in supporting
OAIPMH-driven resource harvesting. 

Part of the argument is based on the observation that, in the semantics of
the OAI-PMH, datestamps capture change to metadata (records precisely), but
not change to resources. The latter may then go unnoticed (if the metadata
has not changed) or too eagerly assumed (if the metadata only has changed).
Problems are thus those of incomplete or inefficient resource harvesting. 

Complex object formats are offered as a potential solution, for they
represent metadata along with the corresponding resources (by-value or
by-reference), so that any change to the latter induces change to the entire
representation and thus the updating of datestamps.

A couple of questions now:

1) how do complex object formats may help with the problem of redundant
harvesting?
2) more importantly, is the propagation of change from resources to metadata
really dependent on the exchange format? Couldn't a provider use DC and yet
enforce a strong versioning policy which translate changes to resource in
new items (and thus records)? Even when (minor) changes are allowed to
preserve the identity of the resources, and thus no versioning takes place,
could not a provider reflect those changes in the datestamp of the
associated metadata records?

Overall, I wonder whether both problems are not related to the provider's
policy rather than the exchange format per se. At best, the adoption of
complex object formats reveals a commitment to change propagation -- among
many other complexity-inducing management policies. This commitment is then
most visible in the case of inclusion by-reference of content within the
overall resource representation; without it, the complex object format alone
would offer no solution. If it's about policy rather than format, then one
wonders whether a complex object format is really a minimal solution for the
datestamp problem. Put another way, should I adopt the overall complexity of
a complex object format to guarantee correctness of incremental resource
harvesting, or should I just introduce as much complexity as needed for the
problem and retain the low costs of a DC-based solution for the rest?

Any comment appreciated,

f


 
  	
##############################################
Fabio Simeoni 
Research Fellow
Department of Computer & Information Sciences
University of Strathclyde, Glasgow

TEL: +44 141 548 (3590)
FAX: +44 141 548 (4523)




More information about the OAI-implementers mailing list