[UPS] Post workshop thoughts: URL encoding, partionspec
Fri, 09 Jun 2000 18:05:35 +0100
Simeon Warner wrote:
> URL encoding of Dienst requests
> The problem with '/' in fixed arguments exists because they
> are encoded in the PATH_INFO. I think it is not just a
> problem with Apache but instead is a `feature' of the CGI
> specification that all characters in the PATH_INFO should
> be decoded:
> and notes at Apache site
> This problem goes away if we change the syntax to encode all
> arguments as keyword arguments (name=value pairs) where the
> encoding and decoding are well defined and not done by the server.
> This need not change the the semantics in any way although there
> would no longer be any need to enforce a particular ordering of the
> would become
This sounds good to me; I couldn't really see any reason for having the
fixed arguments specified as hierarchical components in the URI anyway.
> Partitionspec needs to be clarified in spec
> I think some extra explanation would make things clearer.
> In the `Institutions;Florida;Frenetics' example it might be noted
> that `Institutions;Florida' specifies all records in partitions
> Florida and below. `Institutions;Frenetics' or simply `Frenetics'
> are not valid partition specifications.
> It should also be stated that software accessing OA compliant
> servers may choose to ignore the partitions completely; they
> are provided because they have been found convenient but are
> not intended to restrict the views of an archive that an OA
> compliant service shows.
I think we should clear up what a partition is, and why it should be
there in the spec. I think partitioning should be thought of as a
service, and as I mentioned at the workshop I don't see the usefulness
of having arbitrary partitioning in the spec, especially if not all
archives are even going to use it.
> Deletions and modifications
> Is there any `record deleted' type reply in full Dienst? arXiv does
> not need this but there were several people at the OAi workshop
> expressing a need for it. Should we refine the Santa Fe text which
> refers to `persistent identifiers' in some way to at least admit
> the possibility of occasional deletions?
> Are modifications adequately taken care of using the revisionDate?
> The spec currently says that _file-after_ `limits the list to those
> full identifiers for records that were added or modified since _date_'.
> Extending this idea to include _file-before_ (which I hope will be
> added), it seems that identifiers for records that changed in any
> way within the specified date range should be returned. Thus an
> identifier will be include if the accessionDate or the revisionDate
> (and possible other dates for, say, intermediate revisions) falls
> within the range. Viewed this way it seems that the question is
> `will the latest modification be reflected in the revisionDate?'.
Perhaps a removal could likewise be treated as a modification, and the
response to a List-Contents could include a <removed> element or
> I am still strongly in favour of a very limited meta-data set
> for OA compliance. In many ways this view was strengthened by
> the diversity of potential applications described at the meeting.
There is a difference between the OAMS (which currently includes 9
fields) and the _required_ (mandatory) part of the OAMS (4 fields).
Surely we can expand the OAMS to include more fields without making the
new fields mandatory? This means that archives can still participate
without much effort, but those archives wishing to participate more
fully can provide more metadata and thus be better utilised by service
providers. This in itself should be incentive for archives to provide
better metadata, since users may be able to access their contents using
a wider range of services, thus increasing readership of the archive
(which is presumably something they want).
> Regarding Carl's proposed switch to using DC namespace, what do
> we do about the things that don't match. Having a mix of
> OAMS and DC words seems to me even worse than having a set
> of OAMS words (albeit with an understood partial-mapping to
Mixing up metadata sets does sound dodgy to me. Even the way that the
OAi allows archives to _not_ support OAMS seems to present problems,
making service provision problematic.
> Is there really no place for `displayID' and `Comments' in DC?
> Although we (arXiv) still have issues with allowing unresticted
> full-data access, I think that the OAi should suggest a standard
> machanism by which archives can allow access to full-data. I would
> prefer to see an extended Dienst subset for this rather than adding
> extra meta-data elements (say `fullDataID').
Perhaps a "try-later" response? In the case of huge archives, perhaps
the OA interface should run as a sort of caching proxy service. Besides,
I thought one of the main aims of interoperability in the OAi was to
remove the need for large, centralised archives :)
> Anyway, it was good to see everyone at the meeting and I hope the
> discussions will spur us all on.
I certainly hope so! I think we need to settle on something fairly soon.
There was talk of the need for an "out-of-the-box" open archive, which I
think we more or less have here at Southampton, but I'm hesitant to
release a version that will be obsolete in a couple of months. If the
protocol is changed then hopefully upgrades can be largely transparent,
but if the metadata set changes they won't be, since those running the
archive will have to re-map their metadata sets onto the new OAMS.
Robert Tansley Tel: +44 (0) 23 80594492
Multimedia Research Group Fax: +44 (0) 23 80592865
Electronics & Computer Science http://www.ecs.soton.ac.uk/~rht96r/
University of Southampton
Southampton SO17 1BJ, UK