[OAI-implementers] Re: [oai-alpha] Re: Help!

Simeon Warner simeon@lanl.gov
Mon, 29 Jan 2001 11:10:33 -0700 (MST)


On Mon, 29 Jan 2001, herbert van de sompel wrote:
> hi all,
> 
> Quite an interesting discussion regarding the exception handling (400,
> empty records et all).  
> 
> When writing the specs as a result of the discussion over the weekend 2
> weeks ago, my goal was to distinguish between the handling of:
> - illegal verbs, 
> - illegal arguments,
> - values (both illegal and not leading to any result).
> 
> I thought that was a pretty clear line to draw:
> 
> - illegal protocol syntax: usage of illegal verbs and illegal arguments
> results in 400
> - something wrong with argument values: usage of illegal values and/or
> values that lead to no result yield "empty" responses
> 
> The current dicussion seems to suggest that illegal values should also
> result in a 400.  But from the discussion, it shows that defining what
> an illegal value is isn't all that simple: illegal as to the protcol
> (e.g. illegal format for datestamp) or illegal in a certain repository
> (e.g. non-supported metadataPrefix), ...  

I disagree here, I think the discussion has only been about what is
illegal from the protocol point of view. I think we all agree that
illegal/unsupported/unknown from a repository point of view should
lead to empty responses.

It is not clear to me why it is beneficial to silently accept requests
that include syntactically incorrect arguments which clearly indicate
errors in a harvester program. 400 seems more sensible.

> While I do agree that the provision of a datestamp with an illegal
> syntax can be considered to be illegal protocol syntax, I remain tempted
> to stick with the original concept, whereby everything that is releated
> to values of arguments is NOT handled with a 400, but with an "empty"
> response. Please note that the out-of-context usage of a
> resumptionToken falls under the category of "illegal protocol syntax"
> because the section on resumptionTokens explicitely says "all other
> usage of resumptionTokens is illegal and hence returns 400". 
> 
>  I think that the issue re oai-format identifiers that was brought up
> supports the above approach: some repos will use oai-formated
> identifiers, others will not. 

From the protocol point of view, only invalid URIs are invalid
identifiers, any other restriction is repository specfic.
 
Cheers,
Simeon


>  similarly, other xsd's will be used as
> "description" containers, some of which may limit the validity of other
> argument values (valid set values, for instance).  if we take the return
> of 400 down to the level of argument values, and in addition take into
> account these repo (or community) specific issues in the decission
> whether an argument value is "legal" or "illegal" (hence in deciding
> whether to return 400 or not), all repos will do exception handling in
> different manners.  I am not too enthusiastic about that idea.
> 
> I am very interested in comments.
> 
> herbert
> 
> Simeon Warner wrote:
> > 
> > On Sat, 27 Jan 2001, ePrints Support wrote:
> > 
> > > (from oai-alpha)
> > >
> > > On Fri, Jan 26, 2001 at 06:26:33PM -0700, Simeon Warner wrote:
> > > > I agree that the following can be illegal in fairly obvious
> > > > ways:
> > > >   from & until     (illegal dates)
> > > >   identifier       (illegal uri)
> > > >   resumptionToken  (spec says illegal use and expired will give 400)
> > > >
> > > > However, according to the schemas of verbs that return values that
> > > > will be used for set and metadataPrefix, they also be illegal:
> > > >   set              (doesn't match "([a-zA-Z0-9_])+(:[A-Za-z0-9]+)*")
> > > >   metadataPrefix   (doesn't match "[a-zA-Z0-9_]+")
> > >
> > > from & until: Illegal dates (I agree)
> > >
> > > identifier: what's an illegal uri?
> > >    a) one which dosn't match oai:[a-z]+:.*    (regexp may be slightly off)
> > > or
> > >    b) one which dosn't match oai:nameofarchive:.*
> > > I suggest (a) - it's possible that an archive could mirror OAI records
> > > from another archive AS WELL as it's own. (Isn't it?)
> > 
> > My feeling was that only a) qualifies for a 400 response.
> > 
> > > resumptionToken: If an archive dosn't support this
> > >    then I suppose it should always give a 400 error.
> > > Isn't there an 'expired' return code in http? it's confusing giving
> > > the same response for 'illegal' and 'expired'
> > 
> > The spec says there should be 400 in both cases. Any sensible
> > harvester will know that it is giving back a once-valid
> > resumptionToken and hence 400 => expired.
> > 
> > > set: not matching [a-zA-Z0-9_])+(:[A-Za-z0-9]+)* is a 400 error but
> > >    how about a set which passes the spec but isn't in the archive?
> > > I suggest it just returns a header with no results in that case.
> > 
> > The second case should certainly return header with no results, only
> > illegal value (not matching regexp) gives 400.
> > 
> > > metadataPrefix: similar. Not matching [a-zA-Z0-9_]+ is illegal (400)
> > >    but what happens if it passes the regexp but isn't in the list
> > >    supported the archive?
> > > Again, I suggest it just returns a header with no results in this case.
> > 
> > Again, unrecognized/unsupported should return header with no results,
> > only illegal value gives 400.
> > 
> > > Other queries:
> > >    oai_dc: When should we put a 'oai_' before the metadataPrefix,
> > >    exactly what does it mean (why isn't it just dc?)
> > 
> > My understanding is that the metadataPrefixes are simply strings
> > returned by ListMetadataFormats which may be reused in requests
> > that specify a metadataPrefix to request metedata according the
> > corresponding schema in the ListMetadataFormats response.
> > Further, 'oai_dc' is the name oai has chosen to refer to dc
> > by (and everyone must support it and not call it 'wibble'
> > instead). Given that the metadataPrefixes are just shorthand
> > names to refer to the schema, I don't know why it was necessary
> > to add the 'oai_'.
> > 
> > >    inside <metadata></metadata> is ANYTHING defined by OAI or
> > >    is everything, including the initial tags <dc></dc> and namespace
> > >    etc. defined by the metadata standard?
> > 
> > As far as I understand it (which is not really very well), everything
> > from the initial <dc ...> tag to the </dc> is specified by the
> > dc schema (http://www.openarchives.org/OAI/dc.xsd), or other
> > schema for other metadata formats. In the dc schema it says:
> > 
> >  <element name="dc" type="dc:dublincoreType"/>
> > 
> > Cheers,
> > Simeon.