jyoung at oclc.org
Tue Dec 7 20:25:11 EST 2004
I tried to come up with a harvesting-based mechanism to work around the
limitations of DP9 for the DSpace community. Rather than work
interactively with a repository like DP9, I harvest the repositories and
create a set-based hierarchy of static HTML pages that I then expose to
search engines. You can see the prototype at
http://www.worldcatlibraries.org/DSpace/. This produces a bushier than
DP9 making it easier for Google et al to crawl in its entirety.
> -----Original Message-----
> From: oai-implementers-bounces at openarchives.org
> [mailto:oai-implementers-bounces at openarchives.org] On Behalf
> Of Michael Nelson
> Sent: Tuesday, December 07, 2004 8:12 PM
> To: Pete Johnston
> Cc: oai-implementers at openarchives.org
> Subject: RE: [OAI-implementers] SOAP-PMH
> > I'm not sure it is strictly true that Google needs to invest in
> > OAI-PMH in order to "index on OAI resources".
> > The Googlebot can crawl HTML representations of the
> metadata records
> > which are also exposed via OAI-PMH (assuming they are served at
> > persistent Google-friendly URIs etc) Isn't this exactly
> what services
> > like the DP9 gateway enable/provide?
> > http://dlib.cs.odu.edu/dp9/
> yes, but there are a number of problems with this approach,
> esp. for large sites. many web crawlers are biased to go
> "wide", not "deep", and DP9 produces deep trees that crawlers
> don't always traverse.
> DP9 can also be unkind to repositories; the proxy between the
> robot and the repository obscures any throttling the repository does.
> DP9 is a neat trick, but it is not a complete solution.
> see also:
> > As Jeff noted a couple of messages upthread, the issue is
> not SOAP v
> > OAI-PMH, or search v harvest, but whether an implementation
> of OAI-PMH
> > semantics over SOAP offers anything that is not available using the
> > current implementation over HTTP GET/POST.
> > Pete
> > _______________________________________________
> > OAI-implementers mailing list
> > List information, archives, preferences and to unsubscribe:
> > http://www.openarchives.org/mailman/listinfo/oai-implementers
> Michael L. Nelson mln at cs.odu.edu http://www.cs.odu.edu/~mln/
> Dept of Computer Science, Old Dominion University, Norfolk VA 23529
> +1 757 683 6393 +1 757 683 4900 (f)
> OAI-implementers mailing list
> List information, archives, preferences and to unsubscribe:
More information about the OAI-implementers