[OAI-implementers] Server Load for ListIdentifiers, ListRecor ds calls

Young,Jeff jyoung@oclc.org
Fri, 12 Apr 2002 17:29:08 -0400


I see. My 45 million record database isn't being used by any other system,
so I never really worried about performance degradation.

You may be on the right track with the secondary database idea. Put an OAI
server on top of both of them, but make sure the primary repository is
behind your firewall. Access from outside the firewall should only go to the
secondary repository. Run an OAI harvester against the primary on a regular
basis to keep the secondary current. That way, the load on your main system
will be minimal.

Jeff

> -----Original Message-----
> From: Yi-Lun Ding [mailto:yding@TNC.ORG]
> Sent: Friday, April 12, 2002 5:04 PM
> To: oai-implementers@oaisrv.nsdl.cornell.edu
> Subject: RE: [OAI-implementers] Server Load for ListIdentifiers,
> ListRecords calls
> 
> 
> Jeff:
> 
> Have you thought about replicating the metadata in another 
> database, and
> letting the secondary database handle the crawl calls, e.g., 
> ListRecords,
> ListIdentifiers?
> 
> Even with an elegant solution, I am still concerned about the 
> load on my
> primary database.  I am tempted to just return "Service 
> Unavailable" for
> anything that requires a big db dump.  I have not seen 
> anything about 2.0
> yet, but are there considerations to limit certain calls by 
> hostname and/or
> by time?  Also, the combination of our middleware and object-oriented
> database schema may limit me in terms of existing solutions.
> 
> -----Original Message-----
> From: Young,Jeff [mailto:jyoung@oclc.org]
> Sent: Friday, April 12, 2002 2:26 PM
> To: 'yding@TNC.ORG'; oai-implementers@oaisrv.nsdl.cornell.edu
> Subject: RE: [OAI-implementers] Server Load for ListIdentifiers,
> ListRecords calls
> 
> 
> Yi-Lun,
> 
> Our theses and dissertations repository has over 4 million records.
> Performance was so bad in my OAI v1.1 implementation that it 
> was effectively
> unusable for this size repository. I expect to have it 
> resolved in my 2.0
> upgrade.
> 
> The way I plan to deal with it is to have our OAI server 
> examine the from
> and until dates to see if they imply a harvest of the 
> repository in its
> entirity. This should be a reasonable expectation the first 
> time a client
> harvests a repository. If so, I will read the database directly from
> beginning to end without going through the indexes. I also 
> plan to use the
> compression feature of OAIv2. Lastly, I'm currently going 
> through the new
> server code looking for optimization opportunities, of which there are
> plenty.
> 
> Our OAI server and harvester software will be available as 
> open-source. The
> server is written as a Java Servlet and includes an abstract database
> interface to allow access to any database engine that 
> implements it. There
> will even be an implementation of the abstract database class 
> included to
> treat a file system as a repository.
> 
> I would encourage you to use an existing open-source 
> implementation of OAI.
> They are available in a variety of flavors if Java Servlets 
> aren't to your
> taste. Information about existing implementations is available on
> http://www.openarchives.org/tools/tools.html. Expect 
> announcements of OAIv2
> upgrades in the coming weeks. The more interest there is in 
> reusing these
> tools, the better we will make them.
> 
> Sincerely,
> 
> Jeff
> 
> > -----Original Message-----
> > From: Yi-Lun Ding [mailto:yding@TNC.ORG]
> > Sent: Friday, April 12, 2002 12:09 PM
> > To: oai-implementers@oaisrv.nsdl.cornell.edu
> > Subject: [OAI-implementers] Server Load for ListIdentifiers,
> > ListRecords
> > calls
> >
> >
> > I am thinking of implementing OAI, but am a little wary of the load
> > requirements of ListIdentifiers and ListRecords for large document
> > repositories.  One, there is the bandwidth requirement of
> > transferring huge
> > blocks of data.  Two, the process would have to go through
> > each record in
> > the database and check the TimeModified/Set attributes.
> >
> > How are people dealing with this issue?
> >
> > Thanks,
> >
> > yi-lun
> >
> >
> > _______________________________________________
> > OAI-implementers mailing list
> > OAI-implementers@oaisrv.nsdl.cornell.edu
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> >
> 
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
>