From atanugarai.lists at gmail.com Thu Aug 7 08:56:37 2008 From: atanugarai.lists at gmail.com (Atanu Garai) Date: Thu Aug 7 08:56:51 2008 Subject: [OAI-implementers] Selective Harvesting OAI-PMH Global Harvesters Message-ID: <489AF105.6070401@gmail.com> *Apologies for cross-posting* Dear Colleagues Globethics.net intends to harvest all ethics related metadata from open repositories around the world and interpolate the same as part of the digital library. We feel that this would be a great service towards fulfilling the information and knowledge needs and exchange for the global ethics community. In so doing, we have studied few alternatives and solutions, as given below: 1. OAI-PMH 2.0 specification and implementation guidelines: The original OAI-PMH 2.0 specification and implementation guideline for 'service providers' like harvesters/aggregators provides steps towards implementing harvesting engine. The only way to provide subject (or keyword) related metadata retrieval, according to this guideline, is to specify the subject in the Set. A closer examination in the set-spec, as available in the ROAR (http://roar.eprints.org/) tells us that 'ethics' as subject does not appear in the data providers that I have surveyed so far. The conclusion is that using OAI-PMH 2.0 implementation guidelines we will not be able to harvest metadata in this domain in an optimal fashion. 2. The second strategy is the strategy followed by AVANO - http://www.ifremer.fr/avano/ - a harvester in the domain of aquatic and marine sciences. Essentially, they aggregate all the metadata in a temporary (internal) database, run a search query and then interpolate the relevant records onto their AVANO public interface. This is a advantageous proposition for subject-specialist harvester, but we are constrained by resources to implement this strategy. 3. The third way, which I have not found any implementation example so far, is to take the relevant metadata from already existing global harvesters like OAI and interpolate into Globethics..net server. The current global harverster that we are examining are - OAISTER and Scientific Commons. However, I would like to know the possible standardized mechanisms by which we can take relevant (searching with the word 'ethics' in Scientific Commons gets 75000+ records) metadata from these harvestors and ingest in our database. Thank you for your time to reflect on this issues. Regards Atanu Garai Globethics.net International Secretariat 150, route de Ferney CH-1211 Geneva 2 Switzerland Tel.: +41 22 791 62 49 Fax: +41 22 710 23 86 Web: www.globethics.net From khage at umich.edu Thu Aug 7 15:18:25 2008 From: khage at umich.edu (Kat Hagedorn) Date: Thu Aug 7 15:15:44 2008 Subject: [OAI-implementers] Selective Harvesting OAI-PMH Global Harvesters In-Reply-To: <489AF105.6070401@gmail.com> Message-ID: Hello Atanu, The best method for accessing what you need from OAIster is to do appropriate searches* , determine which repositories have records that will suit your purposes (column on the left side of the results page), and then ask me for a list of the OAI baseURLs of those repositories. Alternatively, you can try and find those repositories at: http://gita.grainger.uiuc.edu/registry, but asking me may prove easier and faster. * e.g., a search for "ethic*" in the Subject field in OAIster yields 10,417 records, but that may be too narrow a search for you Please let me know if you have further questions. Regards, -Kat ------------------- Kat Hagedorn OAIster/Metadata Harvesting Librarian DLXS Bibliographic Class Coordinator Digital Library Production Service University of Michigan http://www.oaister.org/ http://www.dlxs.org/ email: khage@umich.edu phone: 734-615-7618 On 8/7/08 8:56 AM, "Atanu Garai" wrote: > *Apologies for cross-posting* > > Dear Colleagues > > Globethics.net intends to harvest all ethics related metadata from > open repositories around the world and interpolate the same as part of > the digital library. We feel that this would be a great service towards > fulfilling the information and knowledge needs and exchange for the > global ethics community. In so doing, we have studied few alternatives > and solutions, as given below: > > 1. OAI-PMH 2.0 specification and implementation guidelines: > The original OAI-PMH 2.0 specification and implementation guideline for > 'service providers' like harvesters/aggregators provides steps towards > implementing harvesting engine. The only way to provide subject (or > keyword) related metadata retrieval, according to this guideline, is to > specify the subject in the Set. A closer examination in the set-spec, > as available in the ROAR > (http://roar.eprints.org/) tells us that 'ethics' > as subject does not appear in the data providers that I have surveyed > so far. The conclusion is that using OAI-PMH 2.0 implementation > guidelines we will not be able to harvest metadata in this domain in an > optimal fashion. > > 2. The second strategy is the strategy followed by AVANO - > http://www.ifremer.fr/avano/ - a harvester in the domain of aquatic and > marine sciences. Essentially, they aggregate all the metadata in a > temporary (internal) database, run a search query and then interpolate > the relevant records onto their AVANO public interface. This is a > advantageous proposition for subject-specialist harvester, but we are > constrained by resources to implement this strategy. > > 3. The third way, which I have not found any implementation example so > far, is to take the relevant metadata from already existing global > harvesters like OAI and interpolate into Globethics..net server. The > current global harverster that we are examining are - OAISTER and > Scientific Commons. However, I would like to know the possible > standardized mechanisms by which we can take relevant (searching with > the word 'ethics' in Scientific Commons gets 75000+ records) metadata from > these harvestors and ingest in our database. > > Thank you for your time to reflect on this issues. > > Regards > Atanu Garai > Globethics.net > International Secretariat > 150, route de Ferney > CH-1211 Geneva 2 > Switzerland > Tel.: +41 22 791 62 49 > Fax: +41 22 710 23 86 > Web: www.globethics.net > > > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://www.openarchives.org/mailman/listinfo/oai-implementers > ------------------- Kat Hagedorn OAIster/Metadata Harvesting Librarian DLXS Bibliographic Class Coordinator Digital Library Production Service University of Michigan http://www.oaister.org/ http://www.dlxs.org/ email: khage@umich.edu phone: 734-615-7618 From Frederic.Merceur at ifremer.fr Mon Aug 11 07:37:30 2008 From: Frederic.Merceur at ifremer.fr (Frederic MERCEUR) Date: Mon Aug 11 07:37:36 2008 Subject: [OAI-implementers] Selective Harvesting OAI-PMH Global Harvesters In-Reply-To: <489AF105.6070401@gmail.com> References: <489AF105.6070401@gmail.com> Message-ID: <48A0247A.502@ifremer.fr> Dear Atanu, Avano is indeed a thematic OAI harvester for aquatic and marine science. Then Avano harvests a few repositories from different aquatic sciences research institutes. All resources stored in those specialized repositories are systematically and automatically referenced in Avano. But only 20% of the records available via Avano come from harvesting of these aquatic repositories. Avano also interrogates a group of Open Archives not specialized in aquatic sciences which contain relevant resources. This is the case for the PubMed Central server, which specializes in biomedical sciences and life sciences, provides more than 18.000 records are relevant to Avano?s research fields. In theory, the thematic harvesting of a repository should be made possible by using the Set option of the OAI-PMH protocol. Nevertheless, in reality, we have never found any ?Marine and Aquatic Sciences? Set in any of the harvested repositories. In order to filter those repositories, we have developed a research system based on key-words and key-expressions related to aquatic sciences. To process repositories that are not perfectly categorized within our fields of interest, Avano uploads all of their records in a temporary database. Those data are indexed before a daily automatic system searches for about 100.000 scientific names of aquatic species in the record. For example, if a record contains the character string Crassostrea gigas (scientific name of an oyster species), we consider that there is hardly any chance that this name is used in a different context than our field of interest, so it will be automatically visible in Avano. Avano also searches for a few hundred of more general terms and expressions related to the aquatic environment. For example, Avano searches for the words fish, marine, fishing, water treatment... Records spotted by this key-word system are then manually validated by librarians before they can be viewed via Avano. To validate those records, librarians use a specific website. Key-words found in records are highlighted. This system allows librarians to reject index files when key-words are not related to their fields of interest (for example when FISH is used for fluorescence in situ hybridization). Of course, this method is far from being ideal: - This method partially relies on a manual sorting of the records which requires some time (a few minutes per day to filter the new files among the 150 repositories already recorded, plus extra time to process the back-log when new repositories are recorded). - As we do not spend more than 2 or 3 seconds to either validate a file or not, we may accept a low percentage of records that are not related to Avano?s fields of interest? Kind regards, Fred Atanu Garai a ?crit : > *Apologies for cross-posting* > > Dear Colleagues > > Globethics.net intends to harvest all ethics related metadata from > open repositories around the world and interpolate the same as part of > the digital library. We feel that this would be a great service towards > fulfilling the information and knowledge needs and exchange for the > global ethics community. In so doing, we have studied few alternatives > and solutions, as given below: > > 1. OAI-PMH 2.0 specification and implementation guidelines: > The original OAI-PMH 2.0 specification and implementation guideline for > 'service providers' like harvesters/aggregators provides steps towards > implementing harvesting engine. The only way to provide subject (or > keyword) related metadata retrieval, according to this guideline, is to > specify the subject in the Set. A closer examination in the set-spec, > as available in the ROAR > (http://roar.eprints.org/) tells us that 'ethics' > as subject does not appear in the data providers that I have surveyed > so far. The conclusion is that using OAI-PMH 2.0 implementation > guidelines we will not be able to harvest metadata in this domain in an > optimal fashion. > > 2. The second strategy is the strategy followed by AVANO - > http://www.ifremer.fr/avano/ - a harvester in the domain of aquatic and > marine sciences. Essentially, they aggregate all the metadata in a > temporary (internal) database, run a search query and then interpolate > the relevant records onto their AVANO public interface. This is a > advantageous proposition for subject-specialist harvester, but we are > constrained by resources to implement this strategy. > > 3. The third way, which I have not found any implementation example so > far, is to take the relevant metadata from already existing global > harvesters like OAI and interpolate into Globethics..net server. The > current global harverster that we are examining are - OAISTER and > Scientific Commons. However, I would like to know the possible > standardized mechanisms by which we can take relevant (searching with > the word 'ethics' in Scientific Commons gets 75000+ records) metadata > from > these harvestors and ingest in our database. > > Thank you for your time to reflect on this issues. > > Regards > Atanu Garai > Globethics.net > International Secretariat > 150, route de Ferney > CH-1211 Geneva 2 > Switzerland > Tel.: +41 22 791 62 49 > Fax: +41 22 710 23 86 > Web: www.globethics.net > > > _______________________________________________ > OAI-implementers mailing list > List information, archives, preferences and to unsubscribe: > http://www.openarchives.org/mailman/listinfo/oai-implementers > -- Fred Merceur Ifremer / Biblioth?que La P?rouse frederic.merceur@ifremer.fr T?l : 02-98-49-88-69 Fax : 02-98-49-88-84 Biblioth?que La P?rouse Archimer, Ifremer's Institutional Repository Avano, a marine and aquatic OAI harvester *Avant d'imprimer, pensez ? l'environnement!* -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.openarchives.org/pipermail/oai-implementers/attachments/20080811/854de607/attachment.htm From atanugarai.lists at gmail.com Tue Aug 12 02:55:35 2008 From: atanugarai.lists at gmail.com (atanu garai) Date: Tue Aug 12 02:55:39 2008 Subject: [OAI-implementers] Selective Harvesting OAI-PMH Global Harvesters In-Reply-To: <48A0247A.502@ifremer.fr> References: <489AF105.6070401@gmail.com> <48A0247A.502@ifremer.fr> Message-ID: Dear Fred, Kate, All, Thank you for providing us with alternative methodologies. We are validating different options and would come back to you once we define appropriate possibilities at our end. Best regards Atanu -- Atanu Garai Globethics.net 2008/8/11 Frederic MERCEUR : > Dear Atanu, > > Avano is indeed a thematic OAI harvester for aquatic and marine science. > > Then Avano harvests a few repositories from different aquatic sciences > research institutes. All resources stored in those specialized repositories > are systematically and automatically referenced in Avano. But only 20% of > the records available via Avano come from harvesting of these aquatic > repositories. From herbertv at lanl.gov Wed Aug 13 09:53:17 2008 From: herbertv at lanl.gov (Herbert Van de Sompel) Date: Wed Aug 13 09:53:18 2008 Subject: [OAI-implementers] OAI6, Geneva: 17-19 June 2009 Message-ID: <48A2E74D.9030501@lanl.gov> (Apologies for Cross Posting) Make a place in your diaries for the next OAI workshop: OAI6, the 6th Workshop on Innovations in Scholarly Communication, which will be held in Geneva, Switzerland: Wednesday 17th to Friday 19th June 2009. The workshop will follow the successful format of previous workshops mixing practical tutorials, presentations from cutting-edge projects and research, discussion groups, posters, and an intense social programme to maximise interaction and communication. It will be possible to register for a part or all of the programme. The workshop is aimed at those involved in the development of open access (OA) repositories and who can influence the direction of developments either within their institution, their country or at an international level - that includes technical developers of OA bibliographic databases and connected services, research information policy developers at university or library level, funding bodies concerned with access to the results of their research, OA publishers, and influential researchers keen to lead OA developments in their own field. Previous workshops have built a strong community spirit and the event is a unique opportunity to exchange ideas and contact details with the wide range of people connected to the OA movement. The OAI series of workshops is one of the biggest international meetings in this field and takes place roughly every two years. http://www.unige.ch/workshop/oai6/ Further information will be added to this web site, including programme details, a registration form, a call for posters, and accommodation and travel advice. The 2009 workshop will be held at a larger venue belonging to the University of Geneva (UniGe) who we welcome as co-organisers of this event. Future announcements may be made directly from the UniGe organisers. Some slides of the new venue can be seen here: http://www.unige.ch/workshop/oai6/diaporama-mail.php The committee looks forward to welcoming you to Geneva next year for another successful event. ******************************************** Joanne Yeomans On behalf of the OAI6 Organising Committee http://www.unige.ch/workshop/oai6/ Email: oaiworkshop-organisation@unige.ch -- Herbert Van de Sompel Digital Library Research & Prototyping Los Alamos National Laboratory, Research Library http://public.lanl.gov/herbertv/ tel. +1 505 667 1267 From spalding at senylrc.org Thu Aug 28 14:40:54 2008 From: spalding at senylrc.org (Zachary Spalding) Date: Thu Aug 28 14:41:22 2008 Subject: [OAI-implementers] Limit number of records return Message-ID: I have been reading through the OAI documentation and I can't find where I can limit the number of records return. I see that I can do it by date but in my case I need to do it my the records. These our my two objectives 1. I want to grab the first 20 records in a collection 2. I want to be able to define a new starting point like start at record 21 and grab the next 20 records. Below is the URL I am using to retrieve a set of records from a collection http://www.hrvh.org/cgi-bin/oai.exe? verb=ListRecords&metadataPrefix=oai_dc&set=bard Any ideas if this can be done, or where in the documentation I should be look at to accomplish this? Thanks Zachary Spalding -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.openarchives.org/pipermail/oai-implementers/attachments/20080828/74f6db64/attachment.htm From mln at cs.odu.edu Thu Aug 28 14:59:00 2008 From: mln at cs.odu.edu (Michael Nelson) Date: Thu Aug 28 14:59:04 2008 Subject: [OAI-implementers] Limit number of records return In-Reply-To: References: Message-ID: Zachary, In OAI-PMH the harvester does not decide how many records you get back. Instead it is the repository that decides how many records to return for a ListRecords response (or headers in a ListIdentifiers response). If you really wanted to, you could do a ListIdentifiers request and then do a GetRecord request on each of the first N records. But it is probably easiest to do a ListRecords and only parse the first N records from the response. regards, Michael On Thu, 28 Aug 2008, Zachary Spalding wrote: > I have been reading through the OAI documentation and I can't find where I can limit > the number of records return. ?I see that I can do it by date but in my case I need to > do it my the records. > These our my two objectives > > 1. ?I want to grab the first 20 records in a collection > 2. I want to be able to define a new starting point like start at record 21 and grab > the next 20 records. > > Below is the URL I am using to retrieve a set of records from a collection > > http://www.hrvh.org/cgi-bin/oai.exe?verb=ListRecords&metadataPrefix=oai_dc&set=bard > > Any ideas if this can be done, or where in the documentation I should be look at to > accomplish this? > > Thanks > > Zachary Spalding > > > ---- Michael L. Nelson mln@cs.odu.edu http://www.cs.odu.edu/~mln/ Dept of Computer Science, Old Dominion University, Norfolk VA 23529 +1 757 683 6393 +1 757 683 4900 (f)