[OAI-implementers] Sets and stuff, OAI 2.0
Tue, 14 May 2002 21:13:41 -0400
hmmm ... an interesting interpretation ...
but as most people understand the protocol (v1.0, v1.1, and v2.0),
a record (everything between <record> and </record>) as returned by
ListRecords should be identical to the same record requested by its
identifier using GetRecord, for a given metadataPrefix (assuming no
dynamic updating between requests).
so, if i submit a ListRecords request for oai_dc and i get a record with
identifier xyz, i can then submit a GetRecord request with xyz/oai_dc
and get the same identical record back. most service providers and
harvesters will make this assumption so even if it is not explicitly
stated in the protocol spec, it may be prudent to adopt it as best
> Thank you, Alan! (and Hussein!)
> I will certainly consider your hack, as it's a huge improvement over
> redundant records or multiple selects.
> Right now, what we put in ListRecord returns varies; I'd like to
> standardize it, but there appear to be no guidelines. I would like to
> return less than GetRecord, but am wondering if perhaps I should return
> the whole enchilada; currently, the most we return is everything but the
> "description" fields (for some sets).
> If ListRecords is used by harvesters to gather entire records quickly,
> and GetRecord is NOT used much (cumbersome, one record at a time), then
> I should want to include all of the record fields.
> If however, harvesters are using <subject> fields (or some such) in
> ListRecord returns to determine what they want to get as full records
> --and ignoring all else -- returning entire records is not worth the
> time and work for either end... and then <subject> fields would be
> very important (to me) to include.
> The 2.0 ListRecord examples only contain the following fields:
> title, creator, type, source, language, identifier; but Hussein tells
> me there are no actual guidelines for this.
> On Wed, 15 May 2002,
> Alan Kent wrote:
>>On Tue, May 14, 2002 at 05:58:15PM -0400, deridder wrote:
>>>The dilemma is: how to implement the database to return records in a
>>>timely manner, and be scalable.
>>>If I allow a record to be in 0-5 sets, and the set fields are in the
>>>same table as the record fields, 5 selects on the same table are required
>>>to respond to a single ListRecord request with set argument.
>>>If I put the sets in a secondary table, pull out all the identifiers for
>>>a given set (same request), then when I have a request for ListRecords
>>>*without* a set argument, I need to do a select on the set table for each
>>>record returned in the ListRecord response.
>>I actually don't have a suggestion here. The database engine I am using
>>(ours! :-) supports nested repeating structures, so we can store and index
>>multiple sets directly in the record without problem. For a relational
>>database, what you are saying makes sense.
>>If you want a hacky suggestion, you could have a field in the same table
>>as the record which contains all the set names (separated by say spaces
>>or commas) so when you fetch the record you can return the set names
>>efficently, but to allow efficient querying have a redundant separate table
>>of set names which can be joined back to the main record table. But I
>>am not speaking with any experience here.
>>>Maybe I should forget the sets altogether. For those of you with
>>>harvesters and search engines: how do you use the repository sets
>>> (or do you?)
>>Our intent is to left our implementation be configured so the person
>>controlling the harvest selects which sets to use. The idea as I
>>understand it is so a Museum with lots of different sorts of information
>>can make it all available, but a physics department could harvest from
>>lots of different sources information only relating to physics (eg:
>>how physics are used in carbon dating or something). But its up to
>>the data provider to define sets, then up to the harvester to decide
>>which sets look interesting.
>>>(Oh, and if you can recommend which ListRecord fields you have found
>>>useful, I'd like to hear about that also; I'd like to standardize my
>>I plan to keep the whole <record> in the database and so let applications
>>use what they want out of it. So I guess I would encourage you to return
>>as much as you can. Is there some specific areas you had in mind?
>>Hope this was a little help,
>>Alan Kent (mailto:email@example.com, http://www.mds.rmit.edu.au/~ajk/)
>>Project: TeraText Technical Director, InQuirion Pty Ltd (www.inquirion.com)
>>Postal: Multimedia Database Systems, RMIT, GPO Box 2476V, Melbourne 3001.
>>Where: RMIT MDS, Bld 91, Level 3, 110 Victoria St, Carlton 3053, VIC Australia.
>>Phone: +61 3 9925 4114 Reception: +61 3 9925 4099 Fax: +61 3 9925 4098
>>OAI-implementers mailing list
> PGPKey: http://www.cs.utk.edu/~deridder/jd-pgp.txt
> OAI-implementers mailing list
hussein suleman - firstname.lastname@example.org - vtcs - http://www.husseinsspace.com