[UPS] Further issues with Dienst software and Open Archives

Robert Tansley rht96r@ecs.soton.ac.uk
Fri, 19 May 2000 16:04:47 +0100


Hi, an even more technical mail this time. (I hope these are appropriate
addresses to send this to; if not, please point me in the right
direction.)

Working on adding Open Archives support to CogPrints, I've discovered an
issue with the latest version of CGI and the dienst code. As of version
2.64, the CGI.pm module by default uses "new style" URLs, in which the
keyword/value query string is passed to the application delimited by
semi-colons ";" instead of ampersands ";". E.g. if I send a request:

Dienst/Repository/4.0/List-Contents?file-after=2000-01-01&meta-format=oams

The dienst software, from CGI::query_string(), receives the arguments as:

file-after=2000-01-01;meta-format=oams

so later on parsing this string produces duff results. (IMO it's a rather
dodgy practice for the CGI.pm team to change default behaviour like this
in a .01 revision.) This particular problem can be worked round by
changing the line where CGI.pm is included in dienst_src/Main/dienst.pl
from:

use CGI qw(:standard);

to:

use CGI qw(:standard -oldstyle_urls);

However, this introduces another problem when using partitionspecs. If I
send a request like:

Dienst/Repository/4.0/List-Contents
     ?partitionspec=physics;hep&file-after=2000-01-01

CGI.pm gives now the query string to dienst as:

partitionspec=physics&hep&file-after=2000-01-01

so it's changing the semicolon in the partitionspec into an &. I tried
URL-encoding the ; (which sounds like good practice anyway) but CGI
doesn't decode the ; in this case, so dienst gets:

partitionspec=physics%3Bhep&file-after=2000-01-01

I can quite easily fix it so the CogPrints code can decode the string,
but with interoperability it takes two to tango; anything making a Dienst
request to CogPrints will have to know to encode the ';'. In the Dienst
protocol specification (either
http://www.cs.cornell.edu/cdlrg/dienst/protocols/DienstProtocol.htm or
http://www.cs.cornell.edu/cdlrg/dienst/protocols/OpenArchivesDienst.htm),
the example List-Contents request doesn't seem to have an encoded ';',
even though in earlier in the document ';' is listed as a character that
requires encoding. So what is the policy on this? Should the ';' be
encoded, in which case the specification document needs to be amended to
reflect this, or should it be left unencoded, in which case the dienst
code needs changing if it is to work with recent versions of CGI?

I also note that in the examples of both Dienst protocol specification
documents, the disseminate verb:

Dienst/Repository/1.0/Disseminate/handlecorp/970101/%23oams/xml

doesn't require the encoding of the / in the full ID "handlecorp/970101",
but does the # ("%23oams"). (I even came across the big kludge in the
dienst code to handle this case!) Requiring some special characters to be
encoded but others to be left unencoded seems to be an inconsistency in
the protocol that needs clearing up.

R

-- 
 Robert Tansley                    Tel: +44 (0) 23 80594492
 Multimedia Research Group         Fax: +44 (0) 23 80592865
 Electronics & Computer Science    http://www.ecs.soton.ac.uk/~rht96r/
 University of Southampton
 Southampton SO17 1BJ, UK