[OAI-implementers] protocol comments, OAI 2.0

Walter Underwood wunder@inktomi.com
Wed, 30 Jan 2002 15:00:37 -0800


--On Wednesday, January 30, 2002 02:08:26 PM -0500 Simeon Warner <simeon@cs.cornell.edu> wrote:
> 
> The technical committee agreed that the protocol should be more decoupled
> from HTTP but we didn't feel that SOAP is the correct option at the
> moment. 

I would be very interested in the reasons for this. With a SOAP
interface, it would be fairly easy to build a harvester for
our search engine. It would be a very nice sample program for our
indexing interface. But with a one-of-a-kind XML protocol, it isn't 
worth the trouble.

I believe that not using SOAP is a serious mistake. It means that
OAI will remain a niche protocol, with few implementations, few
users, and little positive effect.

With SOAP, you get scaling support, test suites, development tools,
supported libraries, directory service, etc. A custom protocol can
never catch up. And implementors have much better things to do
with their time than re-invent RPC.

> Load is a concern for some implementers. For example, arXiv (the
> repository I work with) would not want to give clients the opportunity to
> ask for all 185,000 metadata records in one response.

The system can limit the number returned in each response.
On the other hand, if the client really needs all the records,
one response is probably more efficient than multiple responses.
And the one response can be cached externally.
  
> ListRecords provides a better way to say if-modified-since for any set or
> the whole repository.

It replicates a standard HTTP feature in a non-standard way. As someone
who writes spiders/harvesters, I think it is not better. At HP, we
used to say, "standard is better than better". But, I'm honestly
not sure whether SOAP allows If-modified-since. If it doesn't, then
don't use it.

> It is not possible to get away from the concept of repositories (usually
> different servers run by different people). Some people want to use sets
> and they can; others can ignore them and there is no overhead in that
> case.

Perhaps my point wasn't clear. Since there is no way to address a
request to more than one set, there is no need for sets to be visible
in the protocol. Different sets can be different protocol endpoints,
with no loss of generality. Instead of respository, speak of the
server which is providing access to the sets.

So one server could provide these:

  arXiv
  arXiv-cs
  arXiv-math
  arXiv-physics
  arXiv-nlin

Or those could be spread across multiple servers.

Moving service discovery to UDDI also means that the OAI protocol doesn't
need the ListSets request.

wunder
--
Walter Underwood
wunder@inktomi.com
Senior Staff Engineer, Inktomi
http://www.inktomi.com/