[OAI-implementers] Re: [OAI-PMH] an error in a regular expression describing an OAI Identifier

Simeon Warner simeon at cs.cornell.edu
Thu Jun 7 15:50:03 EDT 2007


I have updated the oai-identifier schema as below. Current schema
   http://openarchives.org/OAI/2.0/oai-identifier.xsd
Previous version for reference at
   http://openarchives.org/OAI/2.0/oai-identifier.2002-06-21.xsd
Updated test instance which points at current schema
   http://openarchives.org/OAI/2.0/oai-identifier-test4.xml

Cheers,
Simeon

On Tue, 22 May 2007, Simeon Warner wrote:
> Hi All,
>
> Agnieszka Lewandowska has pointed out an error in the patterns matching 
> domain names in the oai-identifier.xsd schema (message excerpt below, the 
> current schema doesn't permit single letter subdomain names). Such names 
> should be permitted (see: http://www.ietf.org/rfc/rfc1035.txt) so I propose 
> the following updates:
>
> in definition of repositoryIdentifierType:
>
> <       <pattern value="[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+"/>
> ---
>>       <pattern value="[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+"/>
>
> and in definition of sampleIdentifierType:
>
> <       <pattern 
> value="oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+:[a-zA-Z0-9\-_\.!~\*&apos;\(\);/\?:@&amp;=\+$,%]+"/>
> --
>>       <pattern 
>> value="oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+:[a-zA-Z0-9\-_\.!~\*&apos;\(\);/\?:@&amp;=\+$,%]+"/>
>
> I have put the updated schema online at
>  http://openarchives.org/OAI/2.0/oai-identifier.xsd.2007-05-22
> and there is a test instance at
>  http://openarchives.org/OAI/2.0/oai-identifier-test4.xml
>
> This change should not invalidate any currently valid instance. Unless 
> someone points out an error I will update the live schema version in a week 
> or two.
>
> Cheers,
> Simeon
>
>
>
> (For the really pedantic, the schema pattern is too broad in that it permits 
> a subdomain name ending in a hyphen (e.g "a-.com") which is not valid 
> according to RFC1035. Correcting this would make the patterns more 
> complicated and and I think it probably isn't worth it to change to something 
> like
> <pattern 
> value="[a-zA-Z]([a-zA-Z0-9]|[a-zA-Z0-9\-]+[a-zA-Z0-9])?(\.[a-zA-Z]([a-zA-Z0-9]|[a-zA-Z0-9\-]+[a-zA-Z0-9])?)+"> 
> However, we could do this is people see value in it.)
>
>
> On Tue, 22 May 2007, Agnieszka Lewandowska wrote:
>> In one of documents describing the format of the OAI Identifier might be an 
>> error. The regular expression
>> 	"oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]+)+:[a-zA-Z0-9\-_\.!~\*&apos;\(\);/\?:@&amp;=\+$,%]+"
>> 
>> from a file under URI:
>> 
>> http://www.openarchives.org/OAI/2.0/oai-identifier.xsd
>> 
>> do not validate a proper URL address: 'ebipol.p.lodz.pl' (the part with
>> '.p' is causing an error). Furthermore a regular expression from a site
>> 
>> http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm
>> 
>> (especially point 2.1) enables URL 'ebibpol.p.lodz.pl'. After a little
>> change in the regular expression:
>> 
>> "oai:[a-zA-Z][a-zA-Z0-9\-]*(\.[a-zA-Z][a-zA-Z0-9\-]*)+:[a-zA-Z0-9\-_\.!~\*&apos;\(\);/\?:@&amp;=\+$,%]+"
>> 
>> it works.
>



More information about the OAI-implementers mailing list