[OAI-implementers] harvester guidelines
Jasper Op de Coul
opdecoul at ubib.eur.nl
Thu May 26 11:54:30 EDT 2011
On 05/26/2011 03:02 PM, McGath, Gary wrote:
>> From: oai-implementers-bounces at openarchives.org [mailto:oai-
>> implementers-bounces at openarchives.org] On Behalf Of Jasper Op de Coul
>> Sent: Thursday, May 26, 2011 6:44 AM
>> To: OAI-implementers at openarchives.org
>> Subject: [OAI-implementers] harvester guidelines
>> Hi list,
>> I've been doing some work with OAIPMH harvesters lately, and would like
>> to share some of my experiences on the subject.
>> When harvesting specific sets with the `set` param, there is an issue
>> that a harvester is not notified when a record is removed from that
>> I think most implementers are aware of this, and it is the biggest hole
>> in the specification.
>> For example: A specific set is harvested, but at a later time one of
>> records is no longer part of that set. The record then disappears from
>> the feed, but the harvester is never notified because there is no
> Implementers of services could avoid this problem by adopting a policy of never removing a record from a set. If its placement in a set turns out to be erroneous or outdated, the service would delete and re-add the record. Of course, this only helps with the services that adopt and announce the policy, and uprooting the old record could be a problem in some scenarios, but it sounds like a reasonable policy to adopt, with the advantage outweighing the downside.
> One problem I can think of is that it could get messy if there are major changes in set organization, resulting in large numbers of bookkeeping deletions.
Yes, solving it with a policy is probably the best option. Especially if
your sets don't change that often.
It is possible to solve the problem on the service side, but it is not
trivial, and kind of a hack:
If a server recieves a request with a set parameter. It could respond by
not only the returning the records from that set, but returning all
records in the repository and marking them as deleted except the records
from the chosen set.
This would be confusing for a client since the server returned records
that were not in the set the client asked. So the server should also add
the requested setspec to all other resources. The adding of the setspec
and deleted headers would be trivial to add in the http server, and
should not be stored in the database.
However, this scenario could lead to problems if a client does multiple
harvests of different sets. In that case a record could be marked as
deleted in one set, while it is not deleted in another set. If the
harvested data is stored in one database (which is common), these
records would overwrite each other.
In the MOAI server we can make many oaipmh feeds out of one oaipmh feed
base on the setspec headers. Every set could basically have it's own
oaipmh feed that contains just the records from that set, and all other
records marked as delete. The harvester could then harvest the feed
without the need to specify a set parameter. Furthermore each of these
oaipmh feeds could use slightly different oai:ids so that there would
not be any collisions when the harvested data is merged into a single
This does not completely solve the problem since you have to get
harvesters to use these different feeds instead of harvesting the 'main'
feed with set params. But for harvesters that use these feeds you have
eliminated the problem, without too much bookkeeping.
Jasper Op de Coul -- Erasmus University Rotterdam
t +31 10 4082922 -- http://eur.nl/ub
Burgemeester Oudlaan 50 3062 PA Rotterdam -- The Netherlands
De informatie verzonden in dit e-mail bericht inclusief de bijlage(n) is
vertrouwelijk en is uitsluitend bestemd voor de geadresseerde van dit
bericht. Lees verder: http://www.eur.nl/email-disclaimer
The information in this e-mail message is confidential and may be legally
privileged. Read more: http://www.eur.nl/english/email-disclaimer
More information about the OAI-implementers