[UPS] RE: Further issues with Dienst software and Open Archives

David Fielding fielding@cs.cornell.edu
Wed, 24 May 2000 11:33:30 -0400


Robert,

	This email discusses two problems with the OA version of Dienst 1)
CGI.pm URL handling changes and 2) encoding issues.

1) CGI.pm 

	The CGI.pm Perl module recently changed their handling of URLs. The
HTTP 4.0 specification, which CGI.pm follows, has eliminated the '&'
character as a separator in
URLs and has replaced it with the semi-colon ';'.

	Dienst expects the '&' character as the separator for optional
arguments. Dienst 
also uses the semi-colon ';' in the partitionspec option to separate
partition component 
tags. The ';' is not encoded.

	Temporary solution: Run older version of CGI.pm (prior to change of
separator).

	Long Term Solution: 
		1) Modify Dienst to accept URLs with either '&' or ';' as a
separator character.
		This will allow different versions of Dienst to continue to
interoperate.
		2) Encode the semi-colon, or change the separator character.
Encoding seems like
		an easy fix.

2) Encoding

	The Dienst protocol is inconsistent when it comes to encoding
special characters.

	Robert points out that we do not (no longer) encode the handle
argument. The 
	Dienst hack works because we know there is at most one handle and we
know 
	the number of fixed arguments. It actually works quite well.
	
	Even more problematic with the encoding of the '/' character in
handles is the 
	fact that Apache refused to accept URLs with encoded slashes. The
most widely
	used browser deems encoded slashes as a security threat so in order
to support
	encoded '/'s every site must apply a fix we provide and recompile
and install
	Apache. This is another reason the handle argument became unencoded
over time.

	Possible Solutions:

		1) Encode everything except slash characters.
		2) Encode everything. Force people to patch/recompile Apache
or convince
		Apache folks to support encoded slash characters.

	It seems encoding everything is the correct solution, but Apache's
reluctance
	to accept encoded slashes is something to consider. 

	The other issue to consider is dissemination of these changes to
sites that think
	they are running an OA compatible repository, or to sites that are
currently
	implementing an OA compatible site.

	Let me know what you decide as a group so I can start working on the
necessary
changes.

Hope this helps, 
David

-----Original Message-----
From: David Fielding 
Sent: Tuesday, May 23, 2000 4:15 PM
To: Carl Lagoze; 'Robert Tansley'
Cc: 'OA discussion list'
Subject: RE: Further issues with Dienst software and Open Archives


Robert,

	I am looking into this and so far I don't like what I see.
Seems the HTTP 4.0 standard eliminates the '&' character from URLs!!
CGI.pm is moving to implement the HTTP 4.0 spec. 

	I believe if we encode ';' characters in the partitionspec and
modify 
Dienst to accept '&' or ';' delimited URLs while decoding the encoded ';'
characters we will be fine.

	Lincoln Stein suggests using an older version of CGI.pm for now.
The change is inevitable so we need to deal with it, might as well be now. 

	I will try to spend some time on this tomorrow and write up
a more thorough response. I will also try to address encoding issues
which are complicated by Apache's refusal to handle encoded '/' characters.

More tomorrow,
David

-----Original Message-----
From: Carl Lagoze 
Sent: Tuesday, May 23, 2000 5:54 AM
To: 'Robert Tansley'; David Fielding
Cc: OA discussion list
Subject: RE: Further issues with Dienst software and Open Archives


Hi Rob,  I believe David Fielding is looking into this?  Let me know if not.

Looks like we'll have some interesting talks in San Antonio next week.

Regards,

Carl

> -----Original Message-----
> From: Robert Tansley [mailto:rht96r@ecs.soton.ac.uk]
> Sent: Friday, May 19, 2000 11:05 AM
> To: lagoze@CS.Cornell.EDU; help@ncstrl.org
> Cc: OA discussion list
> Subject: Further issues with Dienst software and Open Archives
> 
> 
> Hi, an even more technical mail this time. (I hope these are 
> appropriate
> addresses to send this to; if not, please point me in the right
> direction.)
> 
> Working on adding Open Archives support to CogPrints, I've 
> discovered an
> issue with the latest version of CGI and the dienst code. As 
> of version
> 2.64, the CGI.pm module by default uses "new style" URLs, in which the
> keyword/value query string is passed to the application delimited by
> semi-colons ";" instead of ampersands ";". E.g. if I send a request:
> 
> Dienst/Repository/4.0/List-Contents?file-after=2000-01-01&meta
> -format=oams
> 
> The dienst software, from CGI::query_string(), receives the 
> arguments as:
> 
> file-after=2000-01-01;meta-format=oams
> 
> so later on parsing this string produces duff results. (IMO 
> it's a rather
> dodgy practice for the CGI.pm team to change default 
> behaviour like this
> in a .01 revision.) This particular problem can be worked round by
> changing the line where CGI.pm is included in 
> dienst_src/Main/dienst.pl
> from:
> 
> use CGI qw(:standard);
> 
> to:
> 
> use CGI qw(:standard -oldstyle_urls);
> 
> However, this introduces another problem when using 
> partitionspecs. If I
> send a request like:
> 
> Dienst/Repository/4.0/List-Contents
>      ?partitionspec=physics;hep&file-after=2000-01-01
> 
> CGI.pm gives now the query string to dienst as:
> 
> partitionspec=physics&hep&file-after=2000-01-01
> 
> so it's changing the semicolon in the partitionspec into an &. I tried
> URL-encoding the ; (which sounds like good practice anyway) but CGI
> doesn't decode the ; in this case, so dienst gets:
> 
> partitionspec=physics%3Bhep&file-after=2000-01-01
> 
> I can quite easily fix it so the CogPrints code can decode the string,
> but with interoperability it takes two to tango; anything 
> making a Dienst
> request to CogPrints will have to know to encode the ';'. In 
> the Dienst
> protocol specification (either
> http://www.cs.cornell.edu/cdlrg/dienst/protocols/DienstProtocol.htm or
> http://www.cs.cornell.edu/cdlrg/dienst/protocols/OpenArchivesD
ienst.htm),
the example List-Contents request doesn't seem to have an encoded ';',
even though in earlier in the document ';' is listed as a character that
requires encoding. So what is the policy on this? Should the ';' be
encoded, in which case the specification document needs to be amended to
reflect this, or should it be left unencoded, in which case the dienst
code needs changing if it is to work with recent versions of CGI?

I also note that in the examples of both Dienst protocol specification
documents, the disseminate verb:

Dienst/Repository/1.0/Disseminate/handlecorp/970101/%23oams/xml

doesn't require the encoding of the / in the full ID "handlecorp/970101",
but does the # ("%23oams"). (I even came across the big kludge in the
dienst code to handle this case!) Requiring some special characters to be
encoded but others to be left unencoded seems to be an inconsistency in
the protocol that needs clearing up.

R

-- 
 Robert Tansley                    Tel: +44 (0) 23 80594492
 Multimedia Research Group         Fax: +44 (0) 23 80592865
 Electronics & Computer Science    http://www.ecs.soton.ac.uk/~rht96r/
 University of Southampton
 Southampton SO17 1BJ, UK