[OAI-implementers] List Id's for multiple sets

deridder deridder@cs.utk.edu
Fri, 9 Feb 2001 09:51:20 -0500 (EST)

Good gracious, Tim!  That *is* complex.  What happens if you have 40
harvesters working on your program at once?  You would have multiple
tables--- are you using cookies?  And do you have time limitations on
accessing those temp tables?  If so, how do you implement that--- and do
you remove all current temp tables on each new query?  Seems like that
would mess up with several current accessess.  But unaccessed tables could
build up also, so I certainly see that they would need to be periodically
cleared out.  


   And yes, I for one would like to see your OAI "bits";  I'd love to
compare how I'm doing things with how others are, to see if I can improve
on my methods.


 On Fri, 9 Feb 2001, Tim Brody wrote:

> On Thu, 8 Feb 2001, deridder wrote:
> >   This is looking more complicated than I expected.  With no dates
> > specified, and no sets specified, the list could be enormous;  and as more
> > and more sets are added, the resumption tokens could get pretty hairy too.
> Excuse my ignorance if this is already obvious to you:
> (as suggested by Chris Gutteridge, this is how I have implemented
> resumptionTokens)
> Initial request:
> Build a temporary table of all the identifiers that match the request,
> this CAN get huge but if you want harvesters to get all of your repository
> there isn't much choice...(indeed I would argue this is more efficient
> than enumerating over sets)
> Output the first 400 records (or whatever) from the temporary table, using
> the identifiers as an index into your database/file system. The
> resumptionToken will be the name of your temporary table and an encoded
> string to tell you what the metadataFormat is (required for ListRecords).
> Temporary table is:
> pos	int,auto_increment
> id	char(64) ... this is OAI Identifier/your archive identifier, but
> if you use OAI to index means ListIdentifiers only needs temporary table
> Latter requests:
> Get the appropriate list of identifiers by saying get "pos > start".
> To manage the temporary tables I have another table, the temp index, which
> stores the table names and the last time they were accessed. Whenever a
> query is started I remove old temporary tables and their associated
> entries in the temp index. To make the resumptionToken even simpler you
> could store the metadataPrefix in the index ...
> The initial request can be very slow, as it has to enumerate over your
> entire archive, but subsequent requests are very quick. Each harvester (if
> it is well behaved) will only need to do this once, subsequent queries
> should use "from" to only grab the latest data.
> e.g. (liable to be broken and knackered as is my wont)
> http://cite-base.ecs.soton.ac.uk/cgi-bin/oai/OAI-script?debug=1&verb=ListRecords&m
> As an aside, I have tried to write my OAI "bits" to be in a seperate, non
> archive-specific library - would people be interested in access to this (I
> can not guarantee its correctness nor robustness, just it supports the
> bits of OAI that I've needed)?
> All the best,
> Tim Brody
> Computer Science, University of Southampton
> email: tdb198@soton.ac.uk
> Web: http://www.ecs.soton.ac.uk/~tdb198/
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers

Version: 2.6.2


pub  1024/341840AD 2001/01/26 Jody DeRidder <deridder@cs.utk.edu>
Key fingerprint =  07 1D D3 00 21 2F FA 83  E8 FD B7 80 D2 D9 D5 2D