[OAI-implementers] Server Load for ListIdentifiers, ListRecords calls

Yi-Lun Ding yding@TNC.ORG
Fri, 12 Apr 2002 17:03:32 -0400


Jeff:

Have you thought about replicating the metadata in another database, and
letting the secondary database handle the crawl calls, e.g., ListRecords,
ListIdentifiers?

Even with an elegant solution, I am still concerned about the load on my
primary database.  I am tempted to just return "Service Unavailable" for
anything that requires a big db dump.  I have not seen anything about 2.0
yet, but are there considerations to limit certain calls by hostname and/or
by time?  Also, the combination of our middleware and object-oriented
database schema may limit me in terms of existing solutions.

-----Original Message-----
From: Young,Jeff [mailto:jyoung@oclc.org]
Sent: Friday, April 12, 2002 2:26 PM
To: 'yding@TNC.ORG'; oai-implementers@oaisrv.nsdl.cornell.edu
Subject: RE: [OAI-implementers] Server Load for ListIdentifiers,
ListRecords calls


Yi-Lun,

Our theses and dissertations repository has over 4 million records.
Performance was so bad in my OAI v1.1 implementation that it was effectively
unusable for this size repository. I expect to have it resolved in my 2.0
upgrade.

The way I plan to deal with it is to have our OAI server examine the from
and until dates to see if they imply a harvest of the repository in its
entirity. This should be a reasonable expectation the first time a client
harvests a repository. If so, I will read the database directly from
beginning to end without going through the indexes. I also plan to use the
compression feature of OAIv2. Lastly, I'm currently going through the new
server code looking for optimization opportunities, of which there are
plenty.

Our OAI server and harvester software will be available as open-source. The
server is written as a Java Servlet and includes an abstract database
interface to allow access to any database engine that implements it. There
will even be an implementation of the abstract database class included to
treat a file system as a repository.

I would encourage you to use an existing open-source implementation of OAI.
They are available in a variety of flavors if Java Servlets aren't to your
taste. Information about existing implementations is available on
http://www.openarchives.org/tools/tools.html. Expect announcements of OAIv2
upgrades in the coming weeks. The more interest there is in reusing these
tools, the better we will make them.

Sincerely,

Jeff

> -----Original Message-----
> From: Yi-Lun Ding [mailto:yding@TNC.ORG]
> Sent: Friday, April 12, 2002 12:09 PM
> To: oai-implementers@oaisrv.nsdl.cornell.edu
> Subject: [OAI-implementers] Server Load for ListIdentifiers,
> ListRecords
> calls
>
>
> I am thinking of implementing OAI, but am a little wary of the load
> requirements of ListIdentifiers and ListRecords for large document
> repositories.  One, there is the bandwidth requirement of
> transferring huge
> blocks of data.  Two, the process would have to go through
> each record in
> the database and check the TimeModified/Set attributes.
>
> How are people dealing with this issue?
>
> Thanks,
>
> yi-lun
>
>
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>