[OAI-implementers] Experimental OAI Registry at UIUC

Caroline Arms caar@loc.gov
Thu, 9 Oct 2003 17:43:16 -0400 (EDT)


A first glance suggests that this is a very interesting view of the OAI
landscape from many perspectives.  For me, your report on our OAI data
provider at LC provides a useful view of our repository that I can point
others at (e.g. for the record counts).  I'm assuming part of the
motivation for creating it was for UIUC to select content for harvesting.  
I can certainly see its value for potential harvesters who are doing
manual selection of cultural heritage content in focused areas.  I can
image the search feature being particularly useful to take advantage of
rich set descriptions.  

One comment.  It took me some time to see that the link from a number of
records in the count table was actually a way to sample record(s) included
in the indexing.  For understanding why a search generated a hit, I think
it would be useful to be able to get to the complete set of samples for a
repository from the repository summary -- or to get straight to the
samples that generated hits from the hits list.  Just a thought.

As a small aside, special characters are not being handled gracefully.  
When I view http://oai.grainger.uiuc.edu/registry/details.asp?id=103 in
Netscape 7.02 or IE 6.0 the description and title in cyrillic for the
grabill set are not coming through.  I realize there are a lot of things
that could be wrong (and haven't had time for detective work).  Are other
seeing the same problem?

Thanks again for the tool and summary reports.

Caroline Arms
Office of Strategic Initiatives                  caar@loc.gov
Library of Congress

PS My article from the special issue of Library Hi Tech on OAI 
  "Available and Useful: OAI at the Library of Congress"
is available freely at
[As a federal employee my "work for hire" is automatically in the public
domain within the United States.]

On Thu, 9 Oct 2003, Thomas G. Habing wrote:

> Hi all,
> This is to announce the availability of a new experimental registry of OAI 
> providers.  The registry can be found at:
>    http://oai.grainger.uiuc.edu/registry/
> The registry was constructed by collecting the baseURLs of all the providers 
> we could from various ListFriends.pl sites, Hussein's repository explorer, 
> etc., as well as a search tool developed using the Google SOAP API to search 
> for possible baseURLs (surprisingly this yielded 30+ new provider sites). 
> Once this list of baseURLs was compiled, a crawler harvested select data 
> from each provider, such as Identify, ListSets, ListMetadataFormats, as well 
> as a collection of sample records, and record counts for each combination of 
> set and metadata format (if possible).  The crawler also traversed to 
> baseURLs found in friends containers or via the provenance container from 
> sample records.  This resulted in a list of about 340 OAI providers which 
> are able to respond, plus about 90 providers which seem to be down.  Many of 
> these are versions of the same provider using different versions of the 
> protocol, still this list is about twice as big as any other list I've come 
> across.
> Plus, it is searchable, which was my primary goal, to make it easier to find 
> relevant OAI providers.  Essentially I have constructed a full-text index on 
> each repository's Identify, ListSets, ListMetdataFormats, and a collection 
> of sample records.
> I have also done some analysis and made various reports available, such as 
> repositories which support compression, a count of the most frequently 
> occurring top-level domains, etc.  Check out the "graph of friends" showing 
> a graphical representation of interconnections between repositories either 
> via friends or provenance.  Did you know that there are 52 distinct XML 
> metadata schemas in use by OAI repositories!
> Based on this database I also have developed an experimental OAI redirector. 
>   If you have an OAI identifier (i.e. oai:PITTAEI.OAI2:558) but don't know 
> where it came from, submit it to
>    http://oai.grainger.uiuc.edu/registry/rx?oai:PITTAEI.OAI2:558
> and you will be redirected to the oai_dc format record for that id, if an 
> appropriate baseURL can be found in the registry database.  Unfortunately it 
> appears than many sites, especially the GenericEprints, are using the same 
> repo identifier (see http://oai.grainger.uiuc.edu/registry/ListRepoIds.asp), 
> so if there are multiple possible baseURLs for a given id, I have a ranking 
> algorithm that attempts to guess the best. I may do some more work on this 
> in the future, maybe looking at an OpenURL type resolver function.
> Anyway, feel free to try it out and let me know of any problems or 
> suggestions you might have.  Also, if you know of any more OAI providers I 
> should add let me know.
> Kind regards,
> 	Tom
> -- 
> Thomas Habing
> Research Programmer, Digital Library Projects
> University of Illinois at Urbana-Champaign
> 155 Grainger Engineering Library Information Center, MC-274
> thabing@uiuc.edu, (217) 244-4425
> http://dli.grainger.uiuc.edu
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers