[OAI-general] question about using the Va Tech provider

Hussein Suleman hussein@cs.uct.ac.za
Fri, 28 Mar 2003 01:12:34 +0200


(this should have gone to oai-implementors so i am cross-posting it)

regarding the VTOAI code i published a while ago, there isn't much it 
does that can lead to a scalability problem. offhand, the only time a 
large chunk of data is needed is when you ListRecords (and maybe 

in any event, i do not suggest 10k records - what is the byte size of a 
typical response ? 30MB ? thats really too much ...

the memory needed to store and process that is probably the cause of the 
problem from the VTOAI perspective, and also from the perspective of 
some XML parsers. instead, try 1000-1500 records. remember also there 
are some potential service providers like me down here in South Africa 
who would almost never be able to get a 30MB chunk at most times of the 
day over our network :)

lastly, fwiw, FastCGI won't help for this problem. but if you want to 
try a really obscure solution, the code has an internal representation 
of a record that you might try replacing with a pointer of some sort. 
that way you don't simultaneously store lots of records in memory but 
you still have enough info to generate the record in XML when required.

hope this helps ... if you want more details about anything really 
specific, send me a direct email.


will.sexton@duke.edu wrote:
> I'm working on establishing OAI provider services, starting with a
> metadata set related to a particular project.  We're using the Va Tech
> "VTOAI OAI-PMH2 PERL Implementation", running on a Solaris machine, under
> Apache server cgi.  The data set is derived from several sources; we've
> worked to convert it all into flat XML files.
> We have, I believe, about 16k records.  When using the repository explorer
> at http://oai.dlib.vt.edu to test our setup with smaller data sets (up to
> 1k records) it works fine.  However, in recent days I've transferred up
> about 10k records for testing, and now we're finding that we're getting
> time-out errors.
> I've spoken with someone in the past who implemented the Va Tech
> implementation using FastCGI in Apache, and they said they had good
> results with it.  But I was working with our sysadmin today to set this
> up, and we've been getting 500 (internal server) errors when we test;
> mainly, I think, because we're not completely sure about how to wrap the
> oai.pl process in FastCGI.
> I know that the author of the Va Tech implementation is on this list, as
> well as others who seem to have a fair amount of experience with OAI
> implementations.  So I'm seeking some advice.  Is the Va Tech
> implementation, plus or minus FastCGI, sufficiently robust to handle a
> repository of 15k+ XML-file records?  If so, and we need to use FastCGI,
> can someone offer some guidance on using them together?  And if not,
> should we consider another provider framework, and which one?
> Thanks in advance...
> Will
> --
> Will Sexton / Metadata Architect * Research & Content Development
> Perkins Library * Duke University / http://www.duke.edu/~wsexton/
> _______________________________________________
> OAI-general mailing list
> OAI-general@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-general

hussein suleman ~ hussein@cs.uct.ac.za ~ http://www.husseinsspace.com