Mac::PropertyList::SAX - work with Mac plists at a low level, fast


Mac-PropertyList-SAX documentation Contained in the Mac-PropertyList-SAX distribution.

Index


Code Index:

NAME

Top

Mac::PropertyList::SAX - work with Mac plists at a low level, fast

SYNOPSIS

Top

See Mac::PropertyList

DESCRIPTION

Top

Mac::PropertyList is useful, but very slow on large files because it does XML parsing itself, intead of handing it off to a dedicated parser. This module uses XML::SAX::ParserFactory to select a parser capable of doing the heavy lifting, reducing parsing time on large files by a factor of 30 or more.

This module does not replace Mac::PropertyList: it depends on it for some package definitions and plist printing routines. You should, however, be able to replace all use Mac::PropertyList lines with use Mac::PropertyList::SAX, without changing anything else, and notice an immediate improvement in performance on large input files.

Performance will depend largely on the parser that XML::SAX::ParserFactory selects for you. By default, XML::SAX::Expat is used; to change the parser used, set the environment variable MAC_PROPERTYLIST_SAX_PARSER to a value accepted by $XML::SAX::ParserPackage from XML::SAX::ParserFactory (or set $XML::SAX::ParserPackage directly).

CLASS VARIABLES

Top

Class scoped variables that control the packages settings.

ENCODE_ENTITIES

Allows the XHTML encoding of the data to be turned off. Default = 1

ENCODE_UNSAFE_CHARS

A Perl character class definition containing the only characters to be XHTML encoded. See HTML::Entities::encode_entities for description of the $unsafe_chars parameter. Default = undef

OLD_BEHAVIOR

Restores the old behavior of double encoding output data. Default = 0

XML::SAX::ParserPackage

Parser to use. Can also be set with environment variable MAC_PROPERTYLIST_SAX_PARSER. Default = "XML::SAX::Expat"

EXPORTS

Top

By default, no functions are exported. Specify individual functions to export as usual, or use the tags ':all', ':create', and ':parse' for the appropriate sets of functions (':create' includes the create* functions as well as plist_as_string; ':parse' includes the parse* functions).

FUNCTIONS

Top

parse_plist_file

See parse_plist_file in Mac::PropertyList

parse_plist_fh

See parse_plist_fh in Mac::PropertyList

parse_plist

See parse_plist in Mac::PropertyList

parse_plist_string

An alias to parse_plist, provided for better regularity compared to Perl SAX.

create_from_ref( HASH_REF | ARRAY_REF )

Create a plist from an array or hash reference.

The values of the hash can be simple scalars or references. Hash and array references are handled recursively, and Mac::PropertyList objects are output correctly. All other scalars are treated as strings (use Mac::PropertyList objects to represent other types of scalars).

Returns a string representing the reference in serialized plist format.

create_from_hash( HASH_REF )

Provided for backward compatibility with Mac::PropertyList: aliases create_from_ref.

create_from_array( ARRAY_REF )

Provided for backward compatibility with Mac::PropertyList: aliases create_from_ref.

_escape( STRING )

Internal use only. Escapes illegal characters into XML entities.

BUGS / CAVEATS

Top

Any sane XML parser you can find to use with this module will decode XHTML-encoded entities in the original property list; Mac::PropertyList doesn't touch them. Also, your XML parser may convert accented/special characters into '\x{ff}' sequences; these are preserved in their original encoding by Mac::PropertyList.

Before version 0.80 of this module, characters invalid in XML were not serialized properly from create_from_ref(); before version 0.82, they were not serialized properly in plist_as_string(). Thanks to Jon Connell for pointing out these problems.

Unlike Mac::PropertyList and old versions (< 0.60) of Mac::PropertyList::SAX, this module does not trim leading and trailing whitespace from plist elements. The difference in behavior is thought to be rarely noticeable; in any case, I believe this module's current behavior is the more correct. Any documentation that covers this problem would be appreciated.

The behavior of create_from_hash and create_from_array has changed: these functions (which are really just aliases to the new create_from_ref function) are now capable of recursively serializing complex data structures. That is: for inputs that Mac::PropertyList's create_from_* functions handled, the output should be the same, but this module supports inputs that Mac::PropertyList does not.

Before version 0.83, this module left the selection of a SAX-based parser entirely to the discretion of XML::SAX::ParserFactory. Unfortunately, it seems impossible to guarantee that the parser returned even supports XML (XML::SAX::RTF could be returned), so it has become necessary to select a parser by default: XML::SAX::Expat, which is now part of the dependencies of this module. If you know you will use another parser of a specific name, you can force installation without XML::SAX::Expat and always specify the parser you wish to use by setting $XML::SAX::ParserPackage or the MAC_PROPERTYLIST_SAX_PARSER environment variable (see "DESCRIPTION").

Before version 0.85, this module contained a bug that caused double encoding of special characters as X[HT]ML entities. Thanks to Bion Pohl and http://ingz-inc.com/ for reporting this issue and supplying a fixed version. The implementation of the $ENCODE_ENTITIES variable and the addition of the $ENCODE_UNSAFE_CHARS variable are also due to Bion Pohl and / or http://ingz-inc.com/.

SUPPORT

Top

Please contact the author with bug reports or feature requests.

AUTHOR

Top

Darren M. Kulp, <kulp @ cpan.org>

THANKS

Top

brian d foy, who created the Mac::PropertyList module whose tests were appropriated for this module.

Bion Pohl and http://ingz-inc.com, for bug report and patch submission.

SEE ALSO

Top

Mac::PropertyList, the inspiration for this module.

COPYRIGHT AND LICENSE

Top


Mac-PropertyList-SAX documentation Contained in the Mac-PropertyList-SAX distribution.
package Mac::PropertyList::SAX;

use strict;
use warnings;

use Carp qw(carp);
use HTML::Entities qw(encode_entities_numeric);
use HTML::Entities::Numbered qw(hex2name name2hex_xml);
# Passthrough function
use Mac::PropertyList qw(plist_as_string);
use XML::SAX::ParserFactory;

use base qw(Exporter);

our @EXPORT_OK = qw(
    parse_plist 
    parse_plist_fh
    parse_plist_file
    parse_plist_string
    plist_as_string
    create_from_ref
    create_from_hash
    create_from_array
);

our %EXPORT_TAGS = (
    all    => \@EXPORT_OK,
    create => [ qw(create_from_ref create_from_hash create_from_array plist_as_string) ],
    parse  => [ qw(parse_plist parse_plist_fh parse_plist_file parse_plist_string) ],
);

our $VERSION = '0.85';



our $ENCODE_ENTITIES     = 1;
our $ENCODE_UNSAFE_CHARS = undef;

our $OLD_BEHAVIOR = 0;

$XML::SAX::ParserPackage = $ENV{MAC_PROPERTYLIST_SAX_PARSER} || "XML::SAX::Expat";

sub parse_plist_file {
    my $file = shift;

    if (ref $file) {
        parse_plist_fh($file);
    } else {
        carp("parse_plist_file: file [$file] does not exist!"), return unless -e $file;
        _parse("parse_uri", $file);
    }
}

sub parse_plist_fh { _parse("parse_file", @_) }

sub parse_plist { _parse("parse_string", @_) }

*parse_plist_string = \&parse_plist;

sub _parse {
    my ($sub, $data) = @_;

    my $handler = Mac::PropertyList::SAX::Handler->new;
    XML::SAX::ParserFactory->parser(Handler => $handler)->$sub($data);

    $handler->{struct}
}

sub create_from_ref {
    sub _handle_value {
        my ($val) = @_;

        sub _handle_hash {
            my ($hash) = @_;
            Mac::PropertyList::SAX::dict->write_open,
                (map { "\t$_" } map {
                    Mac::PropertyList::SAX::dict->write_key($OLD_BEHAVIOR ? _escape($_) : $_),
                    _handle_value($hash->{$_}) } keys %$hash),
                Mac::PropertyList::SAX::dict->write_close
        }

        sub _handle_array {
            my ($array) = @_;
            Mac::PropertyList::SAX::array->write_open,
                (map { "\t$_" } map { _handle_value($_) } @$array),
                Mac::PropertyList::SAX::array->write_close
        }

        # We could hand off serialization of all Mac::PropertyList::Item objects
        # but there is no 'write' method defined for it (though all its
        # subclasses have one). Let's just handle Scalars, which are safe.
           if (UNIVERSAL::can($val, 'write')) { $val->write }
        elsif (UNIVERSAL::isa($val,  'HASH')) { _handle_hash ($val) }
        elsif (UNIVERSAL::isa($val, 'ARRAY')) { _handle_array($val) }
        else { Mac::PropertyList::SAX::string->new($OLD_BEHAVIOR ? _escape($val) : $val)->write }
    }

    $Mac::PropertyList::XML_head .
        (join "\n", _handle_value(shift)) . "\n" .
        $Mac::PropertyList::XML_foot;
}

*create_from_hash = \&create_from_ref;

*create_from_array = \&create_from_ref;

sub _escape {
    my $string = join("\n",grep(defined,@_));
    $ENCODE_ENTITIES && 
        return name2hex_xml(hex2name(encode_entities_numeric($string,
                                                             $ENCODE_UNSAFE_CHARS)));
    return $string;
}

package Mac::PropertyList::SAX::Handler;

use strict;
use warnings;
# State definitions
use enum qw(S_EMPTY S_TOP S_FREE S_DICT S_ARRAY S_KEY S_TEXT);

use Carp qw(carp croak);
use MIME::Base64;

# Element-name definitions
use constant +{ qw( ROOT  plist
                    KEY   key
                    DATA  data
                    DICT  dict
                    ARRAY array ) };

use base qw(XML::SAX::Base);

# From the plist DTD
our (%types, %simple_types, %complex_types, %numerical_types);
{
    my @complex_types   = (DICT, ARRAY);
    my @numerical_types = qw(real integer true false);
    my @simple_types    = qw(data date real integer string true false);
    my @types           = (@complex_types, @numerical_types, @simple_types);

    my $atoh = sub { map { $_ => 1 } @_ };

    %types           = $atoh->(@          types);
    %simple_types    = $atoh->(@   simple_types);
    %complex_types   = $atoh->(@  complex_types);
    %numerical_types = $atoh->(@numerical_types);
}

sub new {
    my %args = (
        accum   => "",
        context => S_EMPTY,
        key     => undef,
        stack   => [ ],
        struct  => undef,
    );

    shift->SUPER::new(%args, @_)
}

sub start_element {
    my $self = shift;
    my ($data) = @_;
    my $name = $data->{Name};

    # State transition definitions
         if ($self->{context} == S_EMPTY and $name eq ROOT) {
             $self->{context}  = S_TOP;
    } elsif ($self->{context} == S_TOP or $types{$name} or $name eq KEY) {
        push @{ $self->{stack} }, {
            key     => $self->{key},
            context => $self->{context},
            struct  => $self->{struct},
        };

        if ($complex_types{$name}) {
            $self->{struct} = "Mac::PropertyList::SAX::$name"->new;
            $self->{context} = eval "S_" . uc $name;
            delete $self->{key};
        }
        elsif ($simple_types{$name}) { $self->{context} = S_TEXT }
        elsif ($name eq KEY) {
            croak "<key/> in improper context $self->{context}" unless $self->{context} == S_DICT;
            $self->{context} = S_KEY;
        }
        else { croak "Top-level element '$name' in plist is not recognized" }
    } else {
        croak "Received invalid start element '$name'";
    }
}

sub end_element {
    my $self = shift;
    my ($data) = @_;
    my $name = $data->{Name};

    if ($name ne ROOT) { # Discard plist element
        my $elt = pop @{ $self->{stack} };

        my $value = $self->{struct};
        ($self->{struct}, $self->{key}, $self->{context}) = @{$elt}{qw(struct key context)};

        if ($simple_types{$name}) {
            # Wrap accumulated character data in an object
            $value = "Mac::PropertyList::SAX::$name"->new(
                exists $self->{accum}
                    ? $name eq DATA
                        ? MIME::Base64::decode_base64($self->{accum})
                        : $self->{accum}
                    : ""
                );

            delete $self->{accum};
        } elsif ($name eq KEY) {
            $self->{key} = $self->{accum};
            delete $self->{accum};
            return;
        }

           if ($self->{context} == S_DICT ) {         $self->{struct}{$self->{key}} = $value }
        elsif ($self->{context} == S_ARRAY) { push @{ $self->{struct} },              $value }
        elsif ($self->{context} == S_TOP  ) {         $self->{struct}               = $value }
        else { croak "Bad context $self->{context}" }
    }
}

sub characters {
    my $self = shift;
    my ($data) = @_;
    $self->{accum} .= $data->{Data} if $self->{context} == S_TEXT or $self->{context} == S_KEY;
}

# Convenient subclasses
package Mac::PropertyList::SAX::array;
use base qw(Mac::PropertyList::array);
package Mac::PropertyList::SAX::dict;
use base qw(Mac::PropertyList::dict);
sub write_key { "<key>" . (Mac::PropertyList::SAX::_escape($_[1]) || '') . "</key>" }
package Mac::PropertyList::SAX::Scalar;
use base qw(Mac::PropertyList::Scalar);
sub write {
    $_[0]->write_open .
        (Mac::PropertyList::SAX::_escape($_[0]->value) || '') .
            $_[0]->write_close
}
use overload '""' => sub { $_[0]->as_basic_data };
package Mac::PropertyList::SAX::date;
use base qw(Mac::PropertyList::date Mac::PropertyList::SAX::Scalar);
package Mac::PropertyList::SAX::real;
use base qw(Mac::PropertyList::real Mac::PropertyList::SAX::Scalar);
package Mac::PropertyList::SAX::integer;
use base qw(Mac::PropertyList::integer Mac::PropertyList::SAX::Scalar);
package Mac::PropertyList::SAX::string;
use base qw(Mac::PropertyList::string Mac::PropertyList::SAX::Scalar);
sub write { $_[0]->Mac::PropertyList::SAX::Scalar::write }
use overload '""' => sub { $_[0]->as_basic_data };
package Mac::PropertyList::SAX::data;
use base qw(Mac::PropertyList::data Mac::PropertyList::SAX::Scalar);
package Mac::PropertyList::SAX::Boolean;
use Object::MultiType;
use base qw(Mac::PropertyList::Boolean Object::MultiType);
use overload '""' => sub { shift->value };
sub new {
    my $class = shift;
    my ($type) = $class =~ /::([^:]+)$/;
    my $b = lc $type eq "true";
    bless Object::MultiType->new(scalar => $type, bool => $b) => $class
}
sub value { ${${$_[0]}->scalar} }
package Mac::PropertyList::SAX::true;
use base qw(Mac::PropertyList::SAX::Boolean Mac::PropertyList::true);
package Mac::PropertyList::SAX::false;
use base qw(Mac::PropertyList::SAX::Boolean Mac::PropertyList::true);

1;

__END__

# vi: set et ts=4 sw=4: #