DAIA - Document Availability Information API in Perl


DAIA documentation Contained in the DAIA distribution.

Index


Code Index:

NAME

Top

DAIA - Document Availability Information API in Perl

DESCRIPTION

Top

The Document Availability Information API (DAIA) defines a data model with serializations in JSON and XML to encode information about the current availability of documents. See http://daia.sourceforge.net/ for more information and the recent developer version. This package provides Perl classes and functions to easily create and manage DAIA information. It can be used to implement DAIA servers, clients, and other programs that handle availability information.

For a detailed description what "availability" means in context of DAIA, see the DAIA specification. This implementation directly maps DAIA information objects to Perl objects, that all provide some standard methods. You can also let the package export functions to handle DAIA data without much object-orientation.

In short the most important concepts of DAIA are:

documents

An abstract document (work or edition). Implemented as DAIA::Document.

items

A particular copy of a document (physical or digital), that services can be provided with. Implemented as DAIA::Item.

availability status

A boolean value and a service that indicates for what an item is available or not available. Implemented as DAIA::Availability with the subclasses DAIA::Available and DAIA::Unavailable.

responses

Information about the availability of a document with a timestamp. Responses are used to send and recieve DAIA data. Implemented as DAIA::Response.

SYNOPSIS

Top

DAIA client

  #!/usr/bin/perl
  use DAIA;

  $daia = DAIA::parse( $url );          # parse from URL
  $daia = DAIA::parse( file => $file ); # parse from File

  # parse from string
  use Encode; # if incoming data is unencoded UTF-8
  $data = Encode::decode_utf8( $data ); # skip this if $data is just Unicode
  $daia = DAIA::parse( data => $string );

This package also includes and installs the command line and CGI client daia to fetch, validate and convert DAIA data. See also the clients directory for an XML Schema of DAIA/XML and an XSLT script to transform it to HTML.

DAIA server

First an example of a DAIA server as CGI script. You need to implement all get_... methods to return meaningful values. Some more hints how to run a DAIA Server below under under DAIA Server hints.

  #!/usr/bin/perl
  use DAIA;
  use CGI; # or some other CGI module, for instance CGI::Minimal
  use utf8; # if source code containts UTF-8

  my $r = response( institution => {
          href    => "http://example.com/homepage.of.institution",
          content => "Name of the Institution" 
  } );

  my $id = CGI->new->param('id');
  $r->addMessage("en" => "Not an URI: $id", errno => 1 )
      unless DAIA::is_uri($id);
  my @holdings = get_holding_information($id);  # YOU need to implement this!

  if ( @holdings ) {
      my $doc = document( id => $id, href => "http://example.com/docs/$id" );
      foreach my $h ( @holdings ) {
          my $item = item();

          my %sto = get_holding_storage( $h );
          $item->storage( id => $sto{id}, href => $sto{href}, $sto{name} );

          my $label = get_holding_label( $h );
          $item->label( $label );

          my $url = get_holding_url( $h );
          $item->href( $url );

          # add availability services
          my @services;

          if ( get_holding_is_here( $h ) ) {
              push @services, available('presentation'), available('loan');
          } elsif( get_holding_is_not_here( $h ) ) {
              push @services, # expected to be back in 5 days
              unavailable( 'presentation', expected => 'P5D' ),
              unavailable( 'loan', expected => 'P5D' );
          } else {
             #  more cases (depending on the complexity of you application)
          }
          $item->add( @services );
      }
      $r->document( $doc );
  } else {
      $r->addMessage( "en" => "No holding information found for id $id" );
  }

  $r->serve( xslt => "http://path.to/daia.xsl" );

In order to get your script run as CGI, you may have to enable CGI with Options +ExecCGI and AddHandler cgi-script .pl in your Apache configuration or .htaccess.

EXPORTED FUNCTIONS

Top

If you prefer function calls in favor of constructor calls, this package providesfunctions for each DAIA class constructor. The functions are named by the object that they create but in lowercase - for instance response for the DAIA::Response object. The functions can be exported in groups. To disable exporting of the functions include DAIA like this:

  use DAIA qw();      # do not export any functions
  use DAIA qw(serve); # only export function 'serve'

By default all functions are exported (group :all) which adds 13 functions to the default namespace! Alternatively you can specify the following groups:

:core

Includes the functions response (DAIA::Response), document (DAIA::Document), item (DAIA::Item), available (DAIA::Available), unavailable (DAIA::Unavailable), and availability (DAIA::Availability)

:entities

Includes the functions institution (DAIA::Institution), department (DAIA::department), storage (DAIA::Storage), and limitation (DAIA::Limitation)

The functions message, error and serve are also exported by default. See DAIA::Message for the parameters of message or error.

serve( [ [ format => ] $format ] [ %options ] )

Calls the method method serve of DAIA::Response or another DAIA object to serialize and send a response to STDOUT with appropriate HTTP headers. You can call it this way:

  serve( $response, @additionlArgs );  # as function
  $response->serve( @additionlArgs );  # as method

ADDITIONAL FUNCTIONS

Top

The following functions are not exported but you can call both them as function and as method:

  DAIA->parse_xml( $xml );
  DAIA::parse_xml( $xml );

On request you can export the functions guess and parse.

parse_xml( $xml )

Parse DAIA/XML from a file or string. The first parameter must be a filename, a string of XML, or a IO::Handle object.

Parsing is more lax then the specification so it silently ignores elements and attributes in foreign namespaces. Returns either a DAIA object or croaks on uncoverable errors.

parse_json( $json )

Parse DAIA/JSON from a file or string. The first parameter must be a filename, a string of XML, or a IO::Handle object.

parse ( $from [ %parameters ] )

Parse DAIA/XML or DAIA/JSON from a file or string. You can specify the source as filename, string, or IO::Handle object as first parameter or with the named from parameter. Alternatively you can either pass a filename or URL with parameter file or a string with parameter data. If from or file is an URL, its content will be fetched via HTTP. The format parameter (json or xml) is required unless the format can be detected automatically the following way:

Normally this function or method returns a single DAIA object. When parsing DAIA/XML it may also return a list of objects. It is recommended to always expect a list unless you are absolutely sure that the result of parsing will be a single DAIA object.

guess ( $string )

Guess serialization format (DAIA/JSON or DAIA/XML) and return json, xml or the empty string.

is_uri ( $value )

Checks whether the value is a well-formed URI. This function is imported from Data::Validate::URI into the namespace of this package as DAIA::is_uri. On request the function can be exported into the default namespace.

DAIA OBJECTS

Top

All objects (documents, items, availability, status, institutions, departments, limitations, storages, messages, errors) are implemented as subclass of DAIA::Object. Therefore, all objects have the following methods:

new

Constructs a new object.

add

Adds typed properties.

xml, struct, json, rdfhash

Returns several serialization forms.

serve

Serialize the object and send it to STDOUT with the appropriate HTTP headers.

DAIA Server hints

Top

DAIA server scripts can be tested on command line by providing HTTP parameters as key=value pairs.

It is recommended to run a DAIA server via mod_perl or FastCGI so it does not need to be compiled each time it is run. For mod_perl you simply put your script in a directory which PerlResponseHandler has been set for (for instance to Apache::Registry or ModPerl::PerlRun).

For FastCGI you need to install FCGI and set the CGI handler to AddHandler fcgid-script .pl in .htaccess. Your DAIA server must consist of an initialization section and a response loop:

  #!/usr/bin/perl
  use DAIA;
  use CGI::Fast;

  # ...initialization section, which is executed only once ...

  while (my $q = new CGI::Fast) { # response loop
      my $id = $q->param('id');

      # ... create response ...

      $response->serve( cgi => $q, exitif => 0 );
  }

The serve methods needs a cgi or format parameter and it is been told not to exit the script. It is recommended to check every given timespan whether the script has been modified and restart in this case:

  #!/usr/bin/perl
  use DAIA;
  use CGI::Fast;

  my $started = time;
  my $thisscript = $0;
  my $lastmod = (stat($thisscript))[9] # mtime;

  sub restart {
      return 0 if time - $started < 10; # check every 10 seconds
      return 1 if (stat($thisscript))[9] > $lastmod;
  }

  while (my $q = new CGI::Fast) { # response loop

      # ... create response ...

      $response->serve( $q, exitif => \&restart } );
  }




SEE ALSO

Top

Please report bugs and feature requests via https://rt.cpan.org/Public/Dist/Display.html?Name=DAIA. The classes of this package are implemented using DAIA::Object which is just another Perl meta-class framework.

The current developer version of this package together with more DAIA implementations in other programming languages is availabe in a project at Sourceforge: http://sourceforge.net/projects/daia/. Feel free to contribute!

A specification of DAIA can be found at http://purl.org/NET/DAIA.

AUTHOR

Top

Jakob Voss <jakob.voss@gbv.de>

LICENSE

Top

Copyright (C) 2009-2010 by Verbundzentrale Goettingen (VZG) and Jakob Voss

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.


DAIA documentation Contained in the DAIA distribution.
package DAIA;

use strict;
our $VERSION = '0.31';

use base 'Exporter';
our %EXPORT_TAGS = (
    core => [qw(response document item available unavailable availability)],
    entities => [qw(institution department storage limitation)],
);
our @EXPORT_OK = qw(is_uri parse guess);
Exporter::export_ok_tags;
$EXPORT_TAGS{all} = [@EXPORT_OK, 'message', 'serve', 'error'];
Exporter::export_tags('all');

use Carp; # use Carp::Clan; # qw(^DAIA::);
use IO::File;
use LWP::Simple qw(get);
use XML::Simple; # only for parsing (may be changed)

use DAIA::Response;
use DAIA::Document;
use DAIA::Item;
use DAIA::Availability;
use DAIA::Available;
use DAIA::Unavailable;
use DAIA::Message;
use DAIA::Entity;
use DAIA::Institution;
use DAIA::Department;
use DAIA::Storage;
use DAIA::Limitation;

use Data::Validate::URI qw(is_uri);

sub response     { local $Carp::CarpLevel = $Carp::CarpLevel + 1; return DAIA::Response->new( @_ ) }
sub document     { local $Carp::CarpLevel = $Carp::CarpLevel + 1; return DAIA::Document->new( @_ ) }
sub item         { local $Carp::CarpLevel = $Carp::CarpLevel + 1; return DAIA::Item->new( @_ ) }
sub available    { local $Carp::CarpLevel = $Carp::CarpLevel + 1; return DAIA::Available->new( @_ ) }
sub unavailable  { local $Carp::CarpLevel = $Carp::CarpLevel + 1; return DAIA::Unavailable->new( @_ ) }
sub availability { local $Carp::CarpLevel = $Carp::CarpLevel + 1; return DAIA::Availability->new( @_ ) }
sub message      { local $Carp::CarpLevel = $Carp::CarpLevel + 1; return DAIA::Message->new( @_ ) }
sub institution  { local $Carp::CarpLevel = $Carp::CarpLevel + 1; return DAIA::Institution->new( @_ ) }
sub department   { local $Carp::CarpLevel = $Carp::CarpLevel + 1; return DAIA::Department->new( @_ ) }
sub storage      { local $Carp::CarpLevel = $Carp::CarpLevel + 1; return DAIA::Storage->new( @_ ) }
sub limitation   { local $Carp::CarpLevel = $Carp::CarpLevel + 1; return DAIA::Limitation->new( @_ ) }

sub error { 
    local $Carp::CarpLevel = $Carp::CarpLevel + 1; 
    my $errno = @_ ? shift : 0;
    return DAIA::Message->new( @_ ? (@_, errno => $errno) : (errno => $errno) );
}

sub serve {
    local $Carp::CarpLevel = $Carp::CarpLevel + 1; 
    shift->serve( @_ );
}

sub parse_xml {
    shift if UNIVERSAL::isa( $_[0], __PACKAGE__ );
    DAIA::parse( shift, format => 'xml', @_ );
}

sub parse_json {
    shift if UNIVERSAL::isa( $_[0], __PACKAGE__ );    
    DAIA::parse( shift, format => 'json' );
}

sub parse {
    shift if UNIVERSAL::isa( $_[0], __PACKAGE__ );
    my ($from, %param) = (@_ % 2) ? (@_) : (undef,@_);
    $from = $param{from} unless defined $from;
    $from = $param{data} unless defined $from;
    my $format = lc($param{format});
    my $file = $param{file};
    $file = $from if defined $from and $from =~ /^http(s)?:\/\//;
    if (not defined $file and defined $from and not defined $param{data}) {
        if( ref($from) eq 'GLOB' or UNIVERSAL::isa($from, 'IO::Handle')) {
            $file = $from;
        } elsif( $from eq '-' ) {
            $file = \*STDIN;
        } elsif( $from =~ /\.(xml|json)$/ ) {
            $file = $from ;
            $format = $1 unless $format;
        }
    }
    if ( $file ) {
        if ( $file =~ /^http(s)?:\/\// ) {
            $from = get($file) or croak "Failed to fetch $file via HTTP"; 
        } else {
            if ( ! (ref($file) eq 'GLOB' or UNIVERSAL::isa( $file, 'IO::Handle') ) ) {
                $file = do { IO::File->new($file, '<:utf8') or croak("Failed to open file $file") };
            }
            # Enable :utf8 layer unless it or some other encoding has already been enabled
            # foreach my $layer ( PerlIO::get_layers( $file ) ) {
            #    return if $layer =~ /^encoding|^utf8/;
            #}
            binmode $file, ':utf8';
            $from = do { local $/; <$file> };
        }
        croak "DAIA serialization is empty" unless $from;
    }

    croak "Missing source to parse from " unless defined $from;

    $format = guess($from) unless $format;

    my $value;
    my @objects;
    my $root = 'Response';

    if ( $format eq 'xml' ) {
        # do not look for filename (security!)
        if (defined $param{data} and guess($from) ne 'xml') {
            croak("XML is not well-formed (<...>)");
        }

        if (guess($from) eq 'xml') {
            utf8::encode($from);;
            #print "IS UTF8?". utf8::is_utf8($from) . "\n";
        }

        my $xml = eval { XMLin( $from, KeepRoot => 1, NSExpand => 1, KeyAttr => [ ] ); };
        $xml = daia_xml_roots($xml);

        croak $@ if $@;
        croak "XML does not contain DAIA elements" unless $xml;

        while (my ($root,$value) = each(%$xml)) {
            $root =~ s/{[^}]+}//;
            $root = ucfirst($root);
            $root = 'Response' if $root eq 'Daia';

            _filter_xml( $value ); # filter out all non DAIA elements and namespaces

            $value = [ $value ] unless ref($value) eq 'ARRAY';

            foreach my $v (@$value) {
                # TODO: croak of $root is not known!
                my $object = eval 'DAIA::'.$root.'->new( $v )';  ##no critic
                croak $@ if $@;
                push @objects, $object;
            }
        }

    } elsif ( $format eq 'json' ) {
        eval { $value = JSON->new->decode($from); };
        croak $@ if $@;

        if ( (keys %$value) == 1 ) {
            my ($k => $v) = %$value;
            if (not $k =~ /^(timestamp|message|institution|document)$/ and ref($v) eq 'HASH') {
                ($root, $value) = (ucfirst($k), $v);
            }
        }

        # outdated variants
        $root = "Response" if $root eq 'Daia';
        delete $value->{'xmlns:xsi'};

        delete $value->{schema} if $root eq 'Response'; # ignore schema attribute

        croak "JSON does not contain DAIA elements" unless $value;
        push @objects, eval('DAIA::'.$root.'->new( $value )');  ##no critic
        croak $@ if $@;

    } else {
        croak "Unknown DAIA serialization format $format";
    }

    return if not wantarray and @objects > 1;
    return wantarray ? @objects : $objects[0];
}

sub guess {
    my $data = shift;
    return '' unless $data;
    return 'xml' if $data =~ m{^\s*\<.*?\>\s*$}s;
    return 'json' if $data =~ m{^\s*\{.*?\}\s*$}s;
    return '';
}

#### internal methods (subject to be changed)

my $NSEXPDAIA = qr/{http:\/\/(ws.gbv.de|purl.org\/ontology)\/daia\/}(.*)/;

# =head1 daia_xml_roots ( $xml )
#
# This internal method is passed a hash reference as parsed by L<XML::Simple>
# and traverses the XML tree to find the first DAIA element(s). It is needed
# if DAIA/XML is wrapped in other XML structures.
#
# =cut

sub daia_xml_roots {
    my $xml = shift; # hash reference
    my $out = { };

    return { } unless UNIVERSAL::isa($xml,'HASH');

    foreach my $key (keys %$xml) {
        my $value = $xml->{$key};

        if ( $key =~ /^{([^}]*)}(.*)/ and !($key =~ $NSEXPDAIA) ) {
            # non DAIA element
            my $children = UNIVERSAL::isa($value,'ARRAY') ? $value : [ $value ];
            @$children = grep {defined $_} map { daia_xml_roots($_) } @$children;
            foreach my $n (@$children) {
                while ( my ($k,$v) = each(%{$n}) ) {
                    next if $k =~ /^xmlns/;
                    $v = [$v] unless UNIVERSAL::isa($v,'ARRAY');
                    if ($out->{$k}) {
                        push @$v, (UNIVERSAL::isa($out->{$k},'ARRAY') ? 
                                   @{$out->{$k}} : $out->{$k});
                    }
                    # filter out scalars
                    @$v = grep {ref($_)} @$v unless $k =~ $NSEXPDAIA;
                    if (@$v) {
                        $out->{$k} = (@$v > 1 ? $v : $v->[0]); 
                    }
                }
            }
        } else { # DAIA element or element without namespace
            $out->{$key} = $value;
        }
    }

    return $out;
}

# filter out non DAIA XML elements and 'xmlns' attributes
sub _filter_xml { 
    my $xml = shift;
    map { _filter_xml($_) } @$xml if ref($xml) eq 'ARRAY';
    return unless ref($xml) eq 'HASH';

    my (@del,%add);
    foreach my $key (keys %$xml) {
        if ($key =~ /^{([^}]*)}(.*)/) {
            my $local = $2;
            if ($1 =~ /^http:\/\/(ws.gbv.de|purl.org\/ontology)\/daia\/$/) {
                $xml->{$local} = $xml->{$key};
            }
            push @del, $key;
        } elsif ($key =~ /^xmlns/ or $key =~ /:/) {
            push @del, $key;
        }
    }

    # remove non-daia elements
    foreach (@del) { delete $xml->{$_}; }

    # recurse
    map { _filter_xml($xml->{$_}) } keys %$xml;
}

1;