[OAI-implementers] DP9 and HTML metadata

Michael L. Nelson mln@ils.unc.edu
Thu, 24 Jan 2002 14:18:25 -0500 (EST)


Walter,

These are excellent suggestions, and ones that I'm sure Xiaoming Liu can
easily add.

But since you're on the line, I have some questions for you ;-)

1.  Do you have an official or personal opinion that you can share about
OAI & spidering?  

2.  DP9 is great for spiders that don't know any better, but what are the
chances of "OAI-aware" spiders?  Or is that such a special case that its
not worth accounting for...

Specifically, I maintain http://naca.larc.nasa.gov/.  Spiders are
frequently churning around in the tens of thousands of possible pages
there.  Of course, this is a good substitute:

http://arc.cs.odu.edu:8080/dp9/listidentifiers/NACA

but even better would be a spider that knew to use:

http://naca.larc.nasa.gov/oai/

thanks,

Michael

On Thu, 24 Jan 2002, Walter Underwood wrote:

> As a spider engineer, I'd like to suggest an improvement to DP9.
> I'm sending this to the whole OAI list partly to introduce myself,
> and partly because it is an interesting omission in DP9.
> 
> DP9 should use HTML metadata standards to present the Dublin Core
> metadata. Right now, it prettyprints the info, but that is not
> useful for a spider. 
> 
> In addition to the pretty representation, the generated HTML should
> include meta tags for each DC element. I'd recommend also using
> native HTML/HTTP standards for a couple of the elements:
> 
>    dc.title:Hamlet --> <title>Hamlet</title>
>    dc.language:en  --> <meta http-equiv="content-language" content="en">
> 
> Our engine (Inktomi Enterprise Search) will use that metadata for
> the information presented in the results page. In addition, the
> engine can be configured to use DC.identifier as the URL which is
> presented with the results.
> 
> Finally, if there are browsable index pages with links to the 
> generated GetRecord pages, those should probably include a
> noindex robots meta tag. Lists of URLs are usually not very
> useful search results. They are excellent roots (start pages)
> for spidering, though.
> 
> wunder
> --
> Walter Underwood
> wunder@inktomi.com
> Senior Staff Engineer, Inktomi
> http://www.inktomi.com/
> 
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> 

---
Michael L. Nelson
NASA Langley Research Center		m.l.nelson@larc.nasa.gov
MS 158, Hampton, VA 23681		http://www.ils.unc.edu/~mln/
+1 757 864 8511				+1 757 864 8342 (f)