[OAI-implementers] SetSpec RegExp

ePrints Support support@eprints.org
Wed, 29 Aug 2001 02:09:11 +0100

I was planning to make the eprints code able to use any
field for OAI sets, not just "subjects" like it does now.

The intention being able to export, for example, each author
as a set.

If you do set a maximum length, I'd hope it was pretty large,
like over a "k", I'm not saying it needs to be, but arbitary
restrictions make me edgy. 

Something like 4096bytes (Is that the legal max for a URL?) 
would be more than enough. But I reckon more than enough is
better than just enough. It is quite possible to imagine 
someone using the MD5 of something as the set tags (or 
whatever) and once you got 5 deep it would start to get 
really long.

OK , I'm being over paranoid, but I was brought up with
people quoting the old "640k should be enough for anybody"
story at me.

On Tue, Aug 28, 2001 at 03:32:43PM -0600, Simeon Warner wrote:
> I see no reason why users should ever see the setSpec, they
> should see the setName instead. Following that logic I
> don't see that strangely encoded setSpecs should matter.
> However, your suggestion is slightly unwieldy. If such schemes
> are adopted and setSpecs get large then we might want to
> consider adding recommendations on the maximum lengths of these
> things to the protocol spec.
> Cheers,
> Simeon.
> On Tue, 28 Aug 2001, ePrints Support wrote:
> > (if this message appears 3 times, sorry, I kept sending
> > it from the wrong account)
> >
> > Argh. I've been working on a minor upgrade to eprints 1.1
> > to bring it "up to code" with regards to OAI1.1 and I just
> > discovered that the SetSpec only allows a-zA-Z0-9 and : as
> > a seperator.
> >
> > Our standard default sets use '-' all over the place.
> >
> > I'm looking at encoding the setspecs as hex strings 0-9A-F
> > so "A" is encoded as "41" etc. This way I can even use UTF-8
> > which means I can do some very interesting things...
> >
> > This _will_ mean that people running eprints will have all
> > their OAI setspec's change. But seeing as their current ones
> > are illegal, that's not a big problem.
> >
> > A bigger problem is that where we currently have bio:bio-ani-behav
> >
> > we now have:
> > 62696F:62696F2D616E692D6265686176
> > which is less human-readable. Does that really matter as it's just
> > a key?
> >
> > Comments please!
> _______________________________________________
> OAI-implementers mailing list
> OAI-implementers@oaisrv.nsdl.cornell.edu
> http://oaisrv.nsdl.cornell.edu/mailman/listinfo/oai-implementers


 Christopher Gutteridge                   support@eprints.org 
 ePrints Technical Support                +44 23 8059 4833