[OAI-implementers] OAI-PMH & SOAP

Walter Underwood wunder@inktomi.com
Sat, 02 Feb 2002 16:34:08 -0800


--On Saturday, February 2, 2002 4:45 PM -0500 Hussein Suleman <hussein@vt.edu> wrote:
>
> - OAI-PMH is not for everyone ... if we generalize it to serve the needs
> of every community it will not be as useful for the purposes for which
> it was intended originally (namely, high-quality metadata transfer among
> digital library systems)

I want that library information to be easily available to all,
not just people willing to run a library-only protocol. I'm not
trying to make libraries different, or change the OAI goal.

With a SOAP protocol, any scripted web page can make a call to OAI.
Servers like DP9 and the repository explorer become very easy to
write. A professor's list of publications could be built from
the eprint data.

Many of our customers are libraries, or have libraries of valuable
docs. Pharaceutical and financial companies would love to have a
protocol like this. Customers regularly ask us how to deal with
metadata stored separately from documents.

> - the primary users of OAI will not be "harvesters" (in the crawler sense)
> ... OAI is specifically NOT trying to create a better Google ... OAI-PMH
> is aimed at high-quality metadata transfer among managed digital libraries

Well, Inktomi is a better Google, but that is a different issue.

PMH seems aimed at batch transfers between WAIS/Harvest style systems.
Modern spiders stopped doing that five years ago. We know a lot more
now. Modern spiders do incremental fetches, adaptive revisits, duplicate
detection, authentication, session cookies, etc.

We can share that experience.

For example, the current approach to lists (ListRecords) allows
a big server to accidentally mount a denial of service attack on a
client. All it has to do is return 1 million records of 1Kb each, and
watch the client die. That is bad.

In a safe list protocol, the client requests a number of results,
and the server is allowed to return fewer. That way, both sides are
safe.

> ... i cannot say "aye" or "nay" to SOAP until i have tested it and i
> think it is reasonable to expect the same of everyone else.

Or maybe not reasonable. The Aye's include: Microsoft, IBM, Sun, Apple,
Oracle, HP, Compaq, SAP, IONA, and so on. The Apache project has two
free implementations.

OAI is already using an XML RPC. Switching from a non-standard XML
RPC to a standard one should be an obvious decision.

Frankly, the only drawback to SOAP is that the interface definition
language, WSDL, is really ugly.

But go ahead and read the SOAP spec. It is rather clear and short
as these things go:

  http://www.w3.org/TR/SOAP/

wunder
--
Walter R. Underwood
Senior Staff Engineer
Inktomi Enterprise Search
http://search.inktomi.com/