[OAI-implementers] Server Load for ListIdentifiers, ListRecor ds calls

Young,Jeff jyoung@oclc.org
Fri, 12 Apr 2002 14:26:25 -0400


Yi-Lun,

Our theses and dissertations repository has over 4 million records.
Performance was so bad in my OAI v1.1 implementation that it was effectively
unusable for this size repository. I expect to have it resolved in my 2.0
upgrade.

The way I plan to deal with it is to have our OAI server examine the from
and until dates to see if they imply a harvest of the repository in its
entirity. This should be a reasonable expectation the first time a client
harvests a repository. If so, I will read the database directly from
beginning to end without going through the indexes. I also plan to use the
compression feature of OAIv2. Lastly, I'm currently going through the new
server code looking for optimization opportunities, of which there are
plenty.

Our OAI server and harvester software will be available as open-source. The
server is written as a Java Servlet and includes an abstract database
interface to allow access to any database engine that implements it. There
will even be an implementation of the abstract database class included to
treat a file system as a repository.

I would encourage you to use an existing open-source implementation of OAI.
They are available in a variety of flavors if Java Servlets aren't to your
taste. Information about existing implementations is available on
http://www.openarchives.org/tools/tools.html. Expect announcements of OAIv2
upgrades in the coming weeks. The more interest there is in reusing these
tools, the better we will make them.

Sincerely,

Jeff

> -----Original Message-----
> From: Yi-Lun Ding [mailto:yding@TNC.ORG]
> Sent: Friday, April 12, 2002 12:09 PM
> To: oai-implementers@oaisrv.nsdl.cornell.edu
> Subject: [OAI-implementers] Server Load for ListIdentifiers, 
> ListRecords
> calls
> 
> 
> I am thinking of implementing OAI, but am a little wary of the load
> requirements of ListIdentifiers and ListRecords for large document
> repositories.  One, there is the bandwidth requirement of 
> transferring huge
> blocks of data.  Two, the process would have to go through 
> each record in
> the database and check the TimeModified/Set attributes.
> 
> How are people dealing with this issue?
> 
> Thanks,
> 
> yi-lun
> 
> 
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>