[OAI-implementers] Selective Harvesting OAI-PMH Global Harvesters

Frederic MERCEUR Frederic.Merceur at ifremer.fr
Mon Aug 11 07:37:30 EDT 2008

Dear Atanu,

Avano <http://www.ifremer.fr/avano/> is indeed a thematic OAI harvester 
for aquatic and marine science.

Then Avano harvests a few repositories from different aquatic sciences 
research institutes. All resources stored in those specialized 
repositories are systematically and automatically referenced in Avano. 
But only 20% of the records available via Avano come from harvesting of 
these aquatic repositories.

Avano also interrogates a group of Open Archives not specialized in 
aquatic sciences which contain relevant resources. This is the case for 
the PubMed Central server, which specializes in biomedical sciences and 
life sciences, provides more than 18.000 records are relevant to Avano’s 
research fields.

In theory, the thematic harvesting of a repository should be made 
possible by using the Set option of the OAI-PMH protocol. Nevertheless, 
in reality, we have never found any “Marine and Aquatic Sciences” Set in 
any of the harvested repositories. In order to filter those 
repositories, we have developed a research system based on key-words and 
key-expressions related to aquatic sciences.

To process repositories that are not perfectly categorized within our 
fields of interest, Avano uploads all of their records in a temporary 

Those data are indexed before a daily automatic system searches for 
about 100.000 scientific names of aquatic species in the record. For 
example, if a record contains the character string Crassostrea gigas 
(scientific name of an oyster species), we consider that there is hardly 
any chance that this name is used in a different context than our field 
of interest, so it will be automatically visible in Avano.

Avano also searches for a few hundred of more general terms and 
expressions related to the aquatic environment. For example, Avano 
searches for the words fish, marine, fishing, water treatment... Records 
spotted by this key-word system are then manually validated by 
librarians before they can be viewed via Avano. To validate those 
records, librarians use a specific website. Key-words found in records 
are highlighted. This system allows librarians to reject index files 
when key-words are not related to their fields of interest (for example 
when FISH is used for fluorescence in situ hybridization).

Of course, this method is far from being ideal:
- This method partially relies on a manual sorting of the records which 
requires some time (a few minutes per day to filter the new files among 
the 150 repositories already recorded, plus extra time to process the 
back-log when new repositories are recorded).
- As we do not spend more than 2 or 3 seconds to either validate a file 
or not, we may accept a low percentage of records that are not related 
to Avano’s fields of interest…

Kind regards,

Atanu Garai a écrit :
> *Apologies for cross-posting*
> Dear Colleagues
> Globethics.net intends to harvest all ethics related metadata from
> open repositories around the world and interpolate the same as part of
> the digital library. We feel that this would be a great service towards
> fulfilling the information and knowledge needs and exchange for the
> global ethics community. In so doing, we have studied few alternatives
> and solutions, as given below:
> 1. OAI-PMH 2.0 specification and implementation guidelines:
> The original OAI-PMH 2.0 specification and implementation guideline for
> 'service providers' like harvesters/aggregators provides steps towards
> implementing harvesting engine. The only way to provide subject (or
> keyword) related metadata retrieval, according to this guideline, is to
> specify the subject in the Set. A closer examination in the set-spec,
> as available in the ROAR
> (http://roar.eprints.org/) tells us that 'ethics'
> as subject does not appear in the data providers that I have surveyed
> so far. The conclusion is that using OAI-PMH 2.0 implementation
> guidelines we will not be able to harvest metadata in this domain in an
> optimal fashion.
> 2. The second strategy is the strategy followed by AVANO -
> http://www.ifremer.fr/avano/ - a harvester in the domain of aquatic and
> marine sciences. Essentially, they aggregate all the metadata in a
> temporary (internal) database, run a search query and then interpolate
> the relevant records onto their AVANO public interface. This is a
> advantageous proposition for subject-specialist harvester, but we are
> constrained by resources to implement this strategy.
> 3. The third way, which I have not found any implementation example so
> far, is to take the relevant metadata from already existing global
> harvesters like OAI and interpolate into Globethics..net server. The
> current global harverster that we are examining are - OAISTER and
> Scientific Commons. However, I would like to know the possible
> standardized mechanisms by which we can take relevant (searching with 
> the word 'ethics' in Scientific Commons gets 75000+ records) metadata 
> from
> these harvestors and ingest in our database.
> Thank you for your time to reflect on this issues.
> Regards
> Atanu Garai
> Globethics.net
> International Secretariat
> 150, route de Ferney
> CH-1211 Geneva 2
> Switzerland
> Tel.: +41 22 791 62 49
> Fax: +41 22 710 23 86
> Web: www.globethics.net
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://www.openarchives.org/mailman/listinfo/oai-implementers

Fred Merceur
Ifremer / Bibliothèque La Pérouse
frederic.merceur at ifremer.fr
Tél : 02-98-49-88-69
Fax : 02-98-49-88-84
Bibliothèque La Pérouse <http://www.ifremer.fr/blp/>
Archimer, Ifremer's Institutional Repository 
Avano, a marine and aquatic OAI harvester <http://www.ifremer.fr/avano/>

*Avant d'imprimer, pensez à l'environnement!*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.openarchives.org/pipermail/oai-implementers/attachments/20080811/854de607/attachment.htm

More information about the OAI-implementers mailing list