[UPS] RE: Further issues with Dienst software and Open Archives

David Fielding fielding@cs.cornell.edu
Tue, 23 May 2000 16:15:14 -0400


Robert,

	I am looking into this and so far I don't like what I see.
Seems the HTTP 4.0 standard eliminates the '&' character from URLs!!
CGI.pm is moving to implement the HTTP 4.0 spec. 

	I believe if we encode ';' characters in the partitionspec and
modify 
Dienst to accept '&' or ';' delimited URLs while decoding the encoded ';'
characters we will be fine.

	Lincoln Stein suggests using an older version of CGI.pm for now.
The change is inevitable so we need to deal with it, might as well be now. 

	I will try to spend some time on this tomorrow and write up
a more thorough response. I will also try to address encoding issues
which are complicated by Apache's refusal to handle encoded '/' characters.

More tomorrow,
David

-----Original Message-----
From: Carl Lagoze 
Sent: Tuesday, May 23, 2000 5:54 AM
To: 'Robert Tansley'; David Fielding
Cc: OA discussion list
Subject: RE: Further issues with Dienst software and Open Archives


Hi Rob,  I believe David Fielding is looking into this?  Let me know if not.

Looks like we'll have some interesting talks in San Antonio next week.

Regards,

Carl

> -----Original Message-----
> From: Robert Tansley [mailto:rht96r@ecs.soton.ac.uk]
> Sent: Friday, May 19, 2000 11:05 AM
> To: lagoze@CS.Cornell.EDU; help@ncstrl.org
> Cc: OA discussion list
> Subject: Further issues with Dienst software and Open Archives
> 
> 
> Hi, an even more technical mail this time. (I hope these are 
> appropriate
> addresses to send this to; if not, please point me in the right
> direction.)
> 
> Working on adding Open Archives support to CogPrints, I've 
> discovered an
> issue with the latest version of CGI and the dienst code. As 
> of version
> 2.64, the CGI.pm module by default uses "new style" URLs, in which the
> keyword/value query string is passed to the application delimited by
> semi-colons ";" instead of ampersands ";". E.g. if I send a request:
> 
> Dienst/Repository/4.0/List-Contents?file-after=2000-01-01&meta
> -format=oams
> 
> The dienst software, from CGI::query_string(), receives the 
> arguments as:
> 
> file-after=2000-01-01;meta-format=oams
> 
> so later on parsing this string produces duff results. (IMO 
> it's a rather
> dodgy practice for the CGI.pm team to change default 
> behaviour like this
> in a .01 revision.) This particular problem can be worked round by
> changing the line where CGI.pm is included in 
> dienst_src/Main/dienst.pl
> from:
> 
> use CGI qw(:standard);
> 
> to:
> 
> use CGI qw(:standard -oldstyle_urls);
> 
> However, this introduces another problem when using 
> partitionspecs. If I
> send a request like:
> 
> Dienst/Repository/4.0/List-Contents
>      ?partitionspec=physics;hep&file-after=2000-01-01
> 
> CGI.pm gives now the query string to dienst as:
> 
> partitionspec=physics&hep&file-after=2000-01-01
> 
> so it's changing the semicolon in the partitionspec into an &. I tried
> URL-encoding the ; (which sounds like good practice anyway) but CGI
> doesn't decode the ; in this case, so dienst gets:
> 
> partitionspec=physics%3Bhep&file-after=2000-01-01
> 
> I can quite easily fix it so the CogPrints code can decode the string,
> but with interoperability it takes two to tango; anything 
> making a Dienst
> request to CogPrints will have to know to encode the ';'. In 
> the Dienst
> protocol specification (either
> http://www.cs.cornell.edu/cdlrg/dienst/protocols/DienstProtocol.htm or
> http://www.cs.cornell.edu/cdlrg/dienst/protocols/OpenArchivesD
ienst.htm),
the example List-Contents request doesn't seem to have an encoded ';',
even though in earlier in the document ';' is listed as a character that
requires encoding. So what is the policy on this? Should the ';' be
encoded, in which case the specification document needs to be amended to
reflect this, or should it be left unencoded, in which case the dienst
code needs changing if it is to work with recent versions of CGI?

I also note that in the examples of both Dienst protocol specification
documents, the disseminate verb:

Dienst/Repository/1.0/Disseminate/handlecorp/970101/%23oams/xml

doesn't require the encoding of the / in the full ID "handlecorp/970101",
but does the # ("%23oams"). (I even came across the big kludge in the
dienst code to handle this case!) Requiring some special characters to be
encoded but others to be left unencoded seems to be an inconsistency in
the protocol that needs clearing up.

R

-- 
 Robert Tansley                    Tel: +44 (0) 23 80594492
 Multimedia Research Group         Fax: +44 (0) 23 80592865
 Electronics & Computer Science    http://www.ecs.soton.ac.uk/~rht96r/
 University of Southampton
 Southampton SO17 1BJ, UK