[OAI-implementers] Server Load for ListIdentifiers, ListRecords calls

Michael L. Nelson mln@ils.unc.edu
Fri, 12 Apr 2002 17:20:09 -0400 (EDT)


On Fri, 12 Apr 2002, Yi-Lun Ding wrote:

> Jeff:
> 
> Have you thought about replicating the metadata in another database, and
> letting the secondary database handle the crawl calls, e.g., ListRecords,
> ListIdentifiers?

though my DP is much smaller, I implemented the capability for the NACA
OAI interface to redirect the harvester to a back-up version (on another
network) if the load on the host machine was above a configurable
parameter.  I did this with an http 302 status code.  In doing this,
however, you should be careful to set requestURL element to what the
harvester asked, not the actual URL (i.e.,
<requestURL>a.foo.org</requestURL>, not
<requestURL>b.bar.org</requestURL>).

assuming you have the resources to do so, running mirrors / backups of
your repository is the the right thing to do (tm).  

> 
> Even with an elegant solution, I am still concerned about the load on my
> primary database.  I am tempted to just return "Service Unavailable" for
> anything that requires a big db dump.  I have not seen anything about 2.0
> yet, but are there considerations to limit certain calls by hostname and/or
> by time? 

2.0 does not address nor prohibit this; it is out of scope of the protocol
itself, but can be accomplished using the standard set of http mechanisms.

it is entirely possible for you to limit access to your repository based
on hostname, time, passwords, etc.  you'd probably want to avoid shutting
off things like "ListRecords" altogether (you could become non-complaint
that way), but you could play with the number of records returned before
a resumptionToken is issued, intervals specified in your 503 response,
etc.


Contextually dependent harvesting is sure to be a reality as more DPs come
online.

regards,

Michael


> Also, the combination of our middleware and object-oriented
> database schema may limit me in terms of existing solutions.
> 
> -----Original Message-----
> From: Young,Jeff [mailto:jyoung@oclc.org]
> Sent: Friday, April 12, 2002 2:26 PM
> To: 'yding@TNC.ORG'; oai-implementers@oaisrv.nsdl.cornell.edu
> Subject: RE: [OAI-implementers] Server Load for ListIdentifiers,
> ListRecords calls
> 
> 
> Yi-Lun,
> 
> Our theses and dissertations repository has over 4 million records.
> Performance was so bad in my OAI v1.1 implementation that it was effectively
> unusable for this size repository. I expect to have it resolved in my 2.0
> upgrade.
> 
> The way I plan to deal with it is to have our OAI server examine the from
> and until dates to see if they imply a harvest of the repository in its
> entirity. This should be a reasonable expectation the first time a client
> harvests a repository. If so, I will read the database directly from
> beginning to end without going through the indexes. I also plan to use the
> compression feature of OAIv2. Lastly, I'm currently going through the new
> server code looking for optimization opportunities, of which there are
> plenty.
> 
> Our OAI server and harvester software will be available as open-source. The
> server is written as a Java Servlet and includes an abstract database
> interface to allow access to any database engine that implements it. There
> will even be an implementation of the abstract database class included to
> treat a file system as a repository.
> 
> I would encourage you to use an existing open-source implementation of OAI.
> They are available in a variety of flavors if Java Servlets aren't to your
> taste. Information about existing implementations is available on
> http://www.openarchives.org/tools/tools.html. Expect announcements of OAIv2
> upgrades in the coming weeks. The more interest there is in reusing these
> tools, the better we will make them.
> 
> Sincerely,
> 
> Jeff
> 
> > -----Original Message-----
> > From: Yi-Lun Ding [mailto:yding@TNC.ORG]
> > Sent: Friday, April 12, 2002 12:09 PM
> > To: oai-implementers@oaisrv.nsdl.cornell.edu
> > Subject: [OAI-implementers] Server Load for ListIdentifiers,
> > ListRecords
> > calls
> >
> >
> > I am thinking of implementing OAI, but am a little wary of the load
> > requirements of ListIdentifiers and ListRecords for large document
> > repositories.  One, there is the bandwidth requirement of
> > transferring huge
> > blocks of data.  Two, the process would have to go through
> > each record in
> > the database and check the TimeModified/Set attributes.
> >
> > How are people dealing with this issue?
> >
> > Thanks,
> >
> > yi-lun
> >
> >
> > _______________________________________________
> > OAI-implementers mailing list
> > OAI-implementers@oaisrv.nsdl.cornell.edu
> > http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> >
> 
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers
> 

---
Michael L. Nelson
NASA Langley Research Center		m.l.nelson@larc.nasa.gov
MS 158, Hampton, VA 23681		http://www.ils.unc.edu/~mln/
+1 757 864 8511				+1 757 864 8342 (f)