[OAI-implementers] harvester guidelines

Jasper Op de Coul opdecoul at ubib.eur.nl
Thu May 26 11:54:30 EDT 2011

On 05/26/2011 03:02 PM, McGath, Gary wrote:
>> From: oai-implementers-bounces at openarchives.org [mailto:oai-
>> implementers-bounces at openarchives.org] On Behalf Of Jasper Op de Coul
>> Sent: Thursday, May 26, 2011 6:44 AM
>> To: OAI-implementers at openarchives.org
>> Subject: [OAI-implementers] harvester guidelines
>> Hi list,
>> I've been doing some work with OAIPMH harvesters lately, and would like
>> to share some of my experiences on the subject.
>> When harvesting specific sets with the `set` param, there is an issue
>> that a harvester is not notified when a record is removed from that
>> set.
>> I think most implementers are aware of this, and it is the biggest hole
>> in the specification.
>> For example: A specific set is harvested, but at a later time one of
>> the
>> records is no longer part of that set. The record then disappears from
>> the feed, but the harvester is never notified because there is no
>> delete
>> event.
> Implementers of services could avoid this problem by adopting a policy of never removing a record from a set. If its placement in a set turns out to be erroneous or outdated, the service would delete and re-add the record. Of course, this only helps with the services that adopt and announce the policy, and uprooting the old record could be a problem in some scenarios, but it sounds like a reasonable policy to adopt, with the advantage outweighing the downside.
> One problem I can think of is that it could get messy if there are major changes in set organization, resulting in large numbers of bookkeeping deletions.

Yes, solving it with a policy is probably the best option. Especially if 
your sets don't change that often.

It is possible to solve the problem on the service side, but it is not 
trivial, and kind of a hack:

If a server recieves a request with a set parameter. It could respond by 
not only the returning the records from that set, but returning all 
records in the repository and marking them as deleted except the records 
from the chosen set.
This would be confusing for a client since the server returned records 
that were not in the set the client asked. So the server should also add 
the requested setspec to all other resources. The adding of the setspec 
and deleted headers would be trivial to add in the http server, and 
should not be stored in the database.
However, this scenario could lead to problems if a client does multiple 
harvests of different sets. In that case a record could be marked as 
deleted in one set, while it is not deleted in another set. If the 
harvested data is stored in one database (which is common), these 
records would overwrite each other.

In the MOAI server we can make many oaipmh feeds out of one oaipmh feed 
base on the setspec headers. Every set could basically have it's own 
oaipmh feed that contains just the records from that set, and all other 
records marked as delete. The harvester could then harvest the feed 
without the need to specify a set parameter. Furthermore each of these 
oaipmh feeds could use slightly different oai:ids so that there would 
not be any collisions when the harvested data is merged into a single 

This does not completely solve the problem since you have to get 
harvesters to use these different feeds instead of harvesting the 'main' 
feed with set params. But for harvesters that use these feeds you have 
eliminated the problem, without too much bookkeeping.

Jasper Op de Coul -- Erasmus University Rotterdam
t +31 10 4082922  -- http://eur.nl/ub
Burgemeester Oudlaan 50 3062 PA Rotterdam -- The Netherlands

De informatie  verzonden in dit e-mail bericht  inclusief de bijlage(n) is
vertrouwelijk  en is  uitsluitend  bestemd  voor de geadresseerde  van dit
bericht. Lees verder: http://www.eur.nl/email-disclaimer

The information in this e-mail message  is confidential and may be legally
privileged. Read more: http://www.eur.nl/english/email-disclaimer

More information about the OAI-implementers mailing list