[OAI-implementers] Perl regexp for validating 'identifier' (anyURI) needed

Tim Brody tim@tim.brody.btinternet.co.uk
Wed, 26 Feb 2003 13:15:50 +0000


The regexps I use are:

For identifier:
/^[[:alpha:]][[:alnum:]\+\-\.]*:.+/

For setspec:
/([A-Za-z0-9_!'\$\(\)\+\-\.\*])+(:[A-Za-z0-9_!'\$\(\)\+\-\.\*]+)*/

For metadata prefix:
/^[\w]+$/

And date:
/^(\d{4})-(\d{2})-(\d{2})(T\d{2}:\d{2}:\d{2}Z)?$/

These are taken from my oai-perl libraries, which contains a module 
"OAI2::Repository" with a method that determines whether OAI arguments 
are valid (draws strongly on Simeon's DLib tutorial from all those years 
ago :-).

All the best,
Tim.

# Copied from Simeon Warner's tutotial at
# http://library.cern.ch/HEPLW/4/papers/3/OAIServer.pm
# (note: his is the wrong grammer for ListSets)
# 0 = optional, 1 = required, 2 = exclusive
my %grammer = (
         'GetRecord' =>
         {
                 'identifier' => [1, \&validate_identifier],
                 'metadataPrefix' => [1, \&validate_metadataPrefix]
         },
         'Identify' => {},
         'ListIdentifiers' =>
         {
                 'from' => [0, \&validate_date],
                 'until' => [0, \&validate_date],
                 'set' => [0, \&validate_setSpec_2_0],
                 'metadataPrefix' => [1, \&validate_metadataPrefix],
                 'resumptionToken' => [2, sub { 1 }]
         },
         'ListMetadataFormats' =>
         {
                 'identifier' => [0, \&validate_identifier]
         },
         'ListRecords' =>
         {
                 'from' => [0, \&validate_date],
                 'until' => [0, \&validate_date],
                 'set' => [0, \&validate_setSpec_2_0],
                 'metadataPrefix' => [1, \&validate_metadataPrefix],
                 'resumptionToken' => [2, sub { 1 }]
         },
         'ListSets' =>
         {
                 'resumptionToken' => [2, sub { 1 }]
         }
);


marinb@gmx.net wrote:
> Hi all.
> 
> I am sure somebody has already written/found a reasonable good perl regexp
> for validating the identifier parameter. I only could find one for decoding
> 
> m|^(?:([^:/?#]+):)?(?://([^/?#]*))?([^?#]*)(?:\?([^#]*))?(?:#(.*))?|
> 
> but it is not suitable for validating as no check is made for allowed
> characters
> within each 'fragment'. There must be a better solution instead of
> extracting
> the fragments and validating each of them separately?
> 
> Can anybody also tell me where is the problem with following request?
> 
> Response to this request did not give error code 'badArgument':
> verb=ListRecords&metadataPrefix=oai_dc&resumptionToken=junk&until=1990-01-10
> 
> Would appreciate very much any help,
> Cheers,
> Marin
>