[OAI-implementers] SOAP-PMH

Young,Jeff jyoung at oclc.org
Tue Dec 7 20:25:11 EST 2004


I tried to come up with a harvesting-based mechanism to work around the
limitations of DP9 for the DSpace community.  Rather than work
interactively with a repository like DP9, I harvest the repositories and
create a set-based hierarchy of static HTML pages that I then expose to
search engines. You can see the prototype at
http://www.worldcatlibraries.org/DSpace/. This produces a bushier than
DP9 making it easier for Google et al to crawl in its entirety.

Jeff

> -----Original Message-----
> From: oai-implementers-bounces at openarchives.org 
> [mailto:oai-implementers-bounces at openarchives.org] On Behalf 
> Of Michael Nelson
> Sent: Tuesday, December 07, 2004 8:12 PM
> To: Pete Johnston
> Cc: oai-implementers at openarchives.org
> Subject: RE: [OAI-implementers] SOAP-PMH
> 
> 
> > I'm not sure it is strictly true that Google needs to invest in 
> > OAI-PMH in order to "index on OAI resources".
> >
> > The Googlebot can crawl HTML representations of the 
> metadata records 
> > which are also exposed via OAI-PMH (assuming they are served at 
> > persistent Google-friendly URIs etc) Isn't this exactly 
> what services 
> > like the DP9 gateway enable/provide?
> >
> > http://dlib.cs.odu.edu/dp9/
> 
> yes, but there are a number of problems with this approach, 
> esp. for large sites.  many web crawlers are biased to go 
> "wide", not "deep", and DP9 produces deep trees that crawlers 
> don't always traverse.
> 
> DP9 can also be unkind to repositories; the proxy between the 
> robot and the repository obscures any throttling the repository does.
> 
> DP9 is a neat trick, but it is not a complete solution.
> 
> see also:
> 
> http://www.cs.odu.edu/~liu_x/dp9/dp9.pdf
> 
> regards,
> 
> Michael
> 
> >
> > As Jeff noted a couple of messages upthread, the issue is 
> not SOAP v 
> > OAI-PMH, or search v harvest, but whether an implementation 
> of OAI-PMH 
> > semantics over SOAP offers anything that is not available using the 
> > current implementation over HTTP GET/POST.
> >
> > Pete
> >
> >
> > _______________________________________________
> > OAI-implementers mailing list
> > List information, archives, preferences and to unsubscribe:
> > http://www.openarchives.org/mailman/listinfo/oai-implementers
> >
> 
> ----
> Michael L. Nelson mln at cs.odu.edu http://www.cs.odu.edu/~mln/ 
> Dept of Computer Science, Old Dominion University, Norfolk VA 23529
> +1 757 683 6393 +1 757 683 4900 (f)
> 
> _______________________________________________
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
> http://www.openarchives.org/mailman/listinfo/oai-implementers
> 
> 



More information about the OAI-implementers mailing list