[OAI-implementers] Experimental OAI Registry at UIUC

Thomas G. Habing thabing@uiuc.edu
Thu, 09 Oct 2003 12:52:22 -0500


Hi all,

This is to announce the availability of a new experimental registry of OAI 
providers.  The registry can be found at:

   http://oai.grainger.uiuc.edu/registry/

The registry was constructed by collecting the baseURLs of all the providers 
we could from various ListFriends.pl sites, Hussein's repository explorer, 
etc., as well as a search tool developed using the Google SOAP API to search 
for possible baseURLs (surprisingly this yielded 30+ new provider sites). 
Once this list of baseURLs was compiled, a crawler harvested select data 
from each provider, such as Identify, ListSets, ListMetadataFormats, as well 
as a collection of sample records, and record counts for each combination of 
set and metadata format (if possible).  The crawler also traversed to 
baseURLs found in friends containers or via the provenance container from 
sample records.  This resulted in a list of about 340 OAI providers which 
are able to respond, plus about 90 providers which seem to be down.  Many of 
these are versions of the same provider using different versions of the 
protocol, still this list is about twice as big as any other list I've come 
across.

Plus, it is searchable, which was my primary goal, to make it easier to find 
relevant OAI providers.  Essentially I have constructed a full-text index on 
each repository's Identify, ListSets, ListMetdataFormats, and a collection 
of sample records.

I have also done some analysis and made various reports available, such as 
repositories which support compression, a count of the most frequently 
occurring top-level domains, etc.  Check out the "graph of friends" showing 
a graphical representation of interconnections between repositories either 
via friends or provenance.  Did you know that there are 52 distinct XML 
metadata schemas in use by OAI repositories!

Based on this database I also have developed an experimental OAI redirector. 
  If you have an OAI identifier (i.e. oai:PITTAEI.OAI2:558) but don't know 
where it came from, submit it to

   http://oai.grainger.uiuc.edu/registry/rx?oai:PITTAEI.OAI2:558

and you will be redirected to the oai_dc format record for that id, if an 
appropriate baseURL can be found in the registry database.  Unfortunately it 
appears than many sites, especially the GenericEprints, are using the same 
repo identifier (see http://oai.grainger.uiuc.edu/registry/ListRepoIds.asp), 
so if there are multiple possible baseURLs for a given id, I have a ranking 
algorithm that attempts to guess the best. I may do some more work on this 
in the future, maybe looking at an OpenURL type resolver function.

Anyway, feel free to try it out and let me know of any problems or 
suggestions you might have.  Also, if you know of any more OAI providers I 
should add let me know.

Kind regards,
	Tom

-- 
Thomas Habing
Research Programmer, Digital Library Projects
University of Illinois at Urbana-Champaign
155 Grainger Engineering Library Information Center, MC-274
thabing@uiuc.edu, (217) 244-4425
http://dli.grainger.uiuc.edu