[UPS] Re: Further issues with Dienst software and Open Archives

Robert Tansley rht96r@ecs.soton.ac.uk
Thu, 25 May 2000 14:54:19 +0100


David Fielding wrote:
> 
> Oops! I misspoke in my earlier email. The '&' character is still supported
> but CGI now
> supports both '&' and ';' as a delimiter, with ';' the internal default for
> the new
> CGI.pm. The solution to modify dienst to accept both delimeters plus
> encoding the ';' will
> still work.

Looking at the code, you seem to be re-implementing the param() method of
CGI anyway, in the Common/utilities.pl, parse_options() function. Was
there a specific reason why you didn't use param()? Of course the issue
still needs to be addressed in the spec. It definitely sounds to me like
encoding the ; is the best way.

Apache's handling of the / sounds like a more difficult problem. Can I
offer a solution 3): disallow the use of /'s in Dienst ID's. I really
don't see how changing the / to a : or somesuch is a problem. If an
archive (like arXiv) must use /'s in its own ID's, is changing it into
and back from a : for Dienst requests really going to cause problems?

R

> 
> -----Original Message-----
> From: David Fielding
> Sent: Wednesday, May 24, 2000 11:34 AM
> To: David Fielding; Carl Lagoze; 'Robert Tansley'
> Cc: 'OA discussion list'
> Subject: RE: Further issues with Dienst software and Open Archives
> 
> Robert,
> 
>         This email discusses two problems with the OA version of Dienst 1)
> CGI.pm URL handling changes and 2) encoding issues.
> 
> 1) CGI.pm
> 
>         The CGI.pm Perl module recently changed their handling of URLs. The
> HTTP 4.0 specification, which CGI.pm follows, has eliminated the '&'
> character as a separator in
> URLs and has replaced it with the semi-colon ';'.
> 
>         Dienst expects the '&' character as the separator for optional
> arguments. Dienst
> also uses the semi-colon ';' in the partitionspec option to separate
> partition component
> tags. The ';' is not encoded.
> 
>         Temporary solution: Run older version of CGI.pm (prior to change of
> separator).
> 
>         Long Term Solution:
>                 1) Modify Dienst to accept URLs with either '&' or ';' as a
> separator character.
>                 This will allow different versions of Dienst to continue to
> interoperate.
>                 2) Encode the semi-colon, or change the separator character.
> Encoding seems like
>                 an easy fix.
> 
> 2) Encoding
> 
>         The Dienst protocol is inconsistent when it comes to encoding
> special characters.
> 
>         Robert points out that we do not (no longer) encode the handle
> argument. The
>         Dienst hack works because we know there is at most one handle and we
> know
>         the number of fixed arguments. It actually works quite well.
> 
>         Even more problematic with the encoding of the '/' character in
> handles is the
>         fact that Apache refused to accept URLs with encoded slashes. The
> most widely
>         used browser deems encoded slashes as a security threat so in order
> to support
>         encoded '/'s every site must apply a fix we provide and recompile
> and install
>         Apache. This is another reason the handle argument became unencoded
> over time.
> 
>         Possible Solutions:
> 
>                 1) Encode everything except slash characters.
>                 2) Encode everything. Force people to patch/recompile Apache
> or convince
>                 Apache folks to support encoded slash characters.
> 
>         It seems encoding everything is the correct solution, but Apache's
> reluctance
>         to accept encoded slashes is something to consider.
> 
>         The other issue to consider is dissemination of these changes to
> sites that think
>         they are running an OA compatible repository, or to sites that are
> currently
>         implementing an OA compatible site.
> 
>         Let me know what you decide as a group so I can start working on the
> necessary
> changes.
> 
> Hope this helps,
> David
> 
> -----Original Message-----
> From: David Fielding
> Sent: Tuesday, May 23, 2000 4:15 PM
> To: Carl Lagoze; 'Robert Tansley'
> Cc: 'OA discussion list'
> Subject: RE: Further issues with Dienst software and Open Archives
> 
> Robert,
> 
>         I am looking into this and so far I don't like what I see.
> Seems the HTTP 4.0 standard eliminates the '&' character from URLs!!
> CGI.pm is moving to implement the HTTP 4.0 spec.
> 
>         I believe if we encode ';' characters in the partitionspec and
> modify
> Dienst to accept '&' or ';' delimited URLs while decoding the encoded ';'
> characters we will be fine.
> 
>         Lincoln Stein suggests using an older version of CGI.pm for now.
> The change is inevitable so we need to deal with it, might as well be now.
> 
>         I will try to spend some time on this tomorrow and write up
> a more thorough response. I will also try to address encoding issues
> which are complicated by Apache's refusal to handle encoded '/' characters.
> 
> More tomorrow,
> David
> 
> -----Original Message-----
> From: Carl Lagoze
> Sent: Tuesday, May 23, 2000 5:54 AM
> To: 'Robert Tansley'; David Fielding
> Cc: OA discussion list
> Subject: RE: Further issues with Dienst software and Open Archives
> 
> Hi Rob,  I believe David Fielding is looking into this?  Let me know if not.
> 
> Looks like we'll have some interesting talks in San Antonio next week.
> 
> Regards,
> 
> Carl
> 
> > -----Original Message-----
> > From: Robert Tansley [mailto:rht96r@ecs.soton.ac.uk]
> > Sent: Friday, May 19, 2000 11:05 AM
> > To: lagoze@CS.Cornell.EDU; help@ncstrl.org
> > Cc: OA discussion list
> > Subject: Further issues with Dienst software and Open Archives
> >
> >
> > Hi, an even more technical mail this time. (I hope these are
> > appropriate
> > addresses to send this to; if not, please point me in the right
> > direction.)
> >
> > Working on adding Open Archives support to CogPrints, I've
> > discovered an
> > issue with the latest version of CGI and the dienst code. As
> > of version
> > 2.64, the CGI.pm module by default uses "new style" URLs, in which the
> > keyword/value query string is passed to the application delimited by
> > semi-colons ";" instead of ampersands ";". E.g. if I send a request:
> >
> > Dienst/Repository/4.0/List-Contents?file-after=2000-01-01&meta
> > -format=oams
> >
> > The dienst software, from CGI::query_string(), receives the
> > arguments as:
> >
> > file-after=2000-01-01;meta-format=oams
> >
> > so later on parsing this string produces duff results. (IMO
> > it's a rather
> > dodgy practice for the CGI.pm team to change default
> > behaviour like this
> > in a .01 revision.) This particular problem can be worked round by
> > changing the line where CGI.pm is included in
> > dienst_src/Main/dienst.pl
> > from:
> >
> > use CGI qw(:standard);
> >
> > to:
> >
> > use CGI qw(:standard -oldstyle_urls);
> >
> > However, this introduces another problem when using
> > partitionspecs. If I
> > send a request like:
> >
> > Dienst/Repository/4.0/List-Contents
> >      ?partitionspec=physics;hep&file-after=2000-01-01
> >
> > CGI.pm gives now the query string to dienst as:
> >
> > partitionspec=physics&hep&file-after=2000-01-01
> >
> > so it's changing the semicolon in the partitionspec into an &. I tried
> > URL-encoding the ; (which sounds like good practice anyway) but CGI
> > doesn't decode the ; in this case, so dienst gets:
> >
> > partitionspec=physics%3Bhep&file-after=2000-01-01
> >
> > I can quite easily fix it so the CogPrints code can decode the string,
> > but with interoperability it takes two to tango; anything
> > making a Dienst
> > request to CogPrints will have to know to encode the ';'. In
> > the Dienst
> > protocol specification (either
> > http://www.cs.cornell.edu/cdlrg/dienst/protocols/DienstProtocol.htm or
> > http://www.cs.cornell.edu/cdlrg/dienst/protocols/OpenArchivesD
> ienst.htm),
> the example List-Contents request doesn't seem to have an encoded ';',
> even though in earlier in the document ';' is listed as a character that
> requires encoding. So what is the policy on this? Should the ';' be
> encoded, in which case the specification document needs to be amended to
> reflect this, or should it be left unencoded, in which case the dienst
> code needs changing if it is to work with recent versions of CGI?
> 
> I also note that in the examples of both Dienst protocol specification
> documents, the disseminate verb:
> 
> Dienst/Repository/1.0/Disseminate/handlecorp/970101/%23oams/xml
> 
> doesn't require the encoding of the / in the full ID "handlecorp/970101",
> but does the # ("%23oams"). (I even came across the big kludge in the
> dienst code to handle this case!) Requiring some special characters to be
> encoded but others to be left unencoded seems to be an inconsistency in
> the protocol that needs clearing up.
> 
> R
> 
> --
>  Robert Tansley                    Tel: +44 (0) 23 80594492
>  Multimedia Research Group         Fax: +44 (0) 23 80592865
>  Electronics & Computer Science    http://www.ecs.soton.ac.uk/~rht96r/
>  University of Southampton
>  Southampton SO17 1BJ, UK

-- 
 Robert Tansley                    Tel: +44 (0) 23 80594492
 Multimedia Research Group         Fax: +44 (0) 23 80592865
 Electronics & Computer Science    http://www.ecs.soton.ac.uk/~rht96r/
 University of Southampton
 Southampton SO17 1BJ, UK