[UPS] Post workshop thoughts: URL encoding, partionspec documentation, meta-data, deletions/modification

Simeon Warner simeon@mmm.lanl.gov
Mon, 5 Jun 2000 17:53:05 -0600 (MDT)


URL encoding of Dienst requests
-------------------------------

The problem with '/' in fixed arguments exists because they
are encoded in the PATH_INFO. I think it is not just a 
problem with Apache but instead is a `feature' of the CGI
specification that all characters in the PATH_INFO should
be decoded:
  http://hoohoo.ncsa.uiuc.edu/cgi/env.html
and notes at Apache site
  http://bugs.apache.org/index/full/876

This problem goes away if we change the syntax to encode all
arguments as keyword arguments (name=value pairs) where the 
encoding and decoding are well defined and not done by the server.
This need not change the the semantics in any way although there 
would no longer be any need to enforce a particular ordering of the
arguments.

Hence:
  /Dienst/Repository/2.0/Structure/handlecorp/970101?view=%23
would become
  /Dienst/Repository/2.0/Structure?fullID=handlecorp%2f970101?view=%23


Partitionspec needs to be clarified in spec
-------------------------------------------

I think some extra explanation would make things clearer.

In the `Institutions;Florida;Frenetics' example it might be noted
that `Institutions;Florida' specifies all records in partitions 
Florida and below. `Institutions;Frenetics' or simply `Frenetics' 
are not valid partition specifications.

It should also be stated that software accessing OA compliant
servers may choose to ignore the partitions completely; they
are provided because they have been found convenient but are
not intended to restrict the views of an archive that an OA
compliant service shows.


Deletions and modifications
---------------------------

Is there any `record deleted' type reply in full Dienst? arXiv does
not need this but there were several people at the OAi workshop
expressing a need for it. Should we refine the Santa Fe text which 
refers to `persistent identifiers' in some way to at least admit
the possibility of occasional deletions?

Are modifications adequately taken care of using the revisionDate?
The spec currently says that _file-after_ `limits the list to those
full identifiers for records that were added or modified since _date_'.
Extending this idea to include _file-before_ (which I hope will be
added), it seems that identifiers for records that changed in any 
way within the specified date range should be returned. Thus an
identifier will be include if the accessionDate or the revisionDate
(and possible other dates for, say, intermediate revisions) falls
within the range. Viewed this way it seems that the question is
`will the latest modification be reflected in the revisionDate?'.


Meta-data
---------

I am still strongly in favour of a very limited meta-data set
for OA compliance. In many ways this view was strengthened by
the diversity of potential applications described at the meeting.

Regarding Carl's proposed switch to using DC namespace, what do
we do about the things that don't match. Having a mix of
OAMS and DC words seems to me even worse than having a set
of OAMS words (albeit with an understood partial-mapping to
DC).

Is there really no place for `displayID' and `Comments' in DC?

Although we (arXiv) still have issues with allowing unresticted
full-data access, I think that the OAi should suggest a standard 
machanism by which archives can allow access to full-data. I would 
prefer to see an extended Dienst subset for this rather than adding
extra meta-data elements (say `fullDataID').



Anyway, it was good to see everyone at the meeting and I hope the
discussions will spur us all on. 

Cheers,
Simeon