[OAI-implementers] Re: [OAI-PMH] an error in a regular expression describing an OAI Identifier

Simeon Warner simeon at cs.cornell.edu
Tue May 22 17:21:10 EDT 2007


Hi All,

Agnieszka Lewandowska has pointed out an error in the patterns matching 
domain names in the oai-identifier.xsd schema (message excerpt below, the 
current schema doesn't permit single letter subdomain names). Such names 
should be permitted (see: http://www.ietf.org/rfc/rfc1035.txt) so I 
propose the following updates:

in definition of repositoryIdentifierType:

<       <pattern value="[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+"/>
---
>       <pattern value="[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+"/>

and in definition of sampleIdentifierType:

<       <pattern value="oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+:[a-zA-Z0-9\-_\.!~\*&apos;\(\);/\?:@&amp;=\+$,%]+"/>
--
>       <pattern value="oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+:[a-zA-Z0-9\-_\.!~\*&apos;\(\);/\?:@&amp;=\+$,%]+"/>

I have put the updated schema online at
   http://openarchives.org/OAI/2.0/oai-identifier.xsd.2007-05-22
and there is a test instance at
   http://openarchives.org/OAI/2.0/oai-identifier-test4.xml

This change should not invalidate any currently valid instance. Unless 
someone points out an error I will update the live schema version in a 
week or two.

Cheers,
Simeon



(For the really pedantic, the schema pattern is too broad in that it 
permits a subdomain name ending in a hyphen (e.g "a-.com") which is not 
valid according to RFC1035. Correcting this would make the patterns more 
complicated and and I think it probably isn't worth it to change to 
something like
<pattern value="[a-zA-Z]([a-zA-Z0-9]|[a-zA-Z0-9\-]+[a-zA-Z0-9])?(\.[a-zA-Z]([a-zA-Z0-9]|[a-zA-Z0-9\-]+[a-zA-Z0-9])?)+"> 
However, we could do this is people see value in it.)


On Tue, 22 May 2007, Agnieszka Lewandowska wrote:
> In one of documents describing the format of the OAI Identifier might be 
> an error. The regular expression
> 	"oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+:[a-zA-Z0-9\-_\.!~\*&apos;\(\);/\?:@&amp;=\+$,%]+"
>
> from a file under URI:
>
> http://www.openarchives.org/OAI/2.0/oai-identifier.xsd
>
> do not validate a proper URL address: 'ebipol.p.lodz.pl' (the part with
> '.p' is causing an error). Furthermore a regular expression from a site
>
> http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm
>
> (especially point 2.1) enables URL 'ebibpol.p.lodz.pl'. After a little
> change in the regular expression:
>
> "oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+:[a-zA-Z0-9\-_\.!~\*&apos;\(\);/\?:@&amp;=\+$,%]+"
>
> it works.



More information about the OAI-implementers mailing list