[OAI-implementers] DP9 and HTML metadata

Walter Underwood wunder@inktomi.com
Thu, 24 Jan 2002 11:02:55 -0800


As a spider engineer, I'd like to suggest an improvement to DP9.
I'm sending this to the whole OAI list partly to introduce myself,
and partly because it is an interesting omission in DP9.

DP9 should use HTML metadata standards to present the Dublin Core
metadata. Right now, it prettyprints the info, but that is not
useful for a spider. 

In addition to the pretty representation, the generated HTML should
include meta tags for each DC element. I'd recommend also using
native HTML/HTTP standards for a couple of the elements:

   dc.title:Hamlet --> <title>Hamlet</title>
   dc.language:en  --> <meta http-equiv="content-language" content="en">

Our engine (Inktomi Enterprise Search) will use that metadata for
the information presented in the results page. In addition, the
engine can be configured to use DC.identifier as the URL which is
presented with the results.

Finally, if there are browsable index pages with links to the 
generated GetRecord pages, those should probably include a
noindex robots meta tag. Lists of URLs are usually not very
useful search results. They are excellent roots (start pages)
for spidering, though.

wunder
--
Walter Underwood
wunder@inktomi.com
Senior Staff Engineer, Inktomi
http://www.inktomi.com/