| XML-Filter-Dispatcher documentation | view source | Contained in the XML-Filter-Dispatcher distribution. |
XML::Filter::Dispatcher - Path based event dispatching with DOM support
use XML::Filter::Dispatcher qw( :all );
my $f = XML::Filter::Dispatcher->new(
Rules => [
'foo' => \&handle_foo_start_tag,
'@bar' => \&handle_bar_attr,
## Send any <foo> elts and their contents to $handler
'snarf//self::node()' => $handler,
## Print the text of all <description> elements
'description'
=> [ 'string()' => sub { push @out, xvalue } ],
],
Vars => {
"id" => [ string => "12a" ],
},
);
WARNING: Beta code alert.
A SAX2 filter that dispatches SAX events based on "EventPath" patterns
as the SAX events arrive. The SAX events are not buffered or converted
to an in-memory document representation like a DOM tree. This provides
for low lag operation because the actions associated with each pattern
are executed as soon as possible, usually in an element's
start_element() event method.
This differs from traditional XML pattern matching tools like
XPath and XSLT (which is XPath-based) which require the entire
document to be built in memory (as a "DOM tree") before queries can be
executed. In SAX terms, this means that they have to build a DOM tree
from SAX events and delay pattern matching until the end_document()
event method is called.
A rule is composed of a pattern and an action. Each XML::Filter::Dispatcher instance has a list of rules. As SAX events are received, the rules are evaluated and one rule's action is executed. If more than one rule matches an event, the rule with the highest score wins; by default a rule's score is its position in the rule list, so the last matching rule the list will be acted on.
A simple rule list looks like:
Rules => [
'a' => \&handle_a,
'b' => \&handle_b,
],
There are several types of actions:
Rules => [
'a' => \&foo,
'b' => sub { print "got a <b>!\n" },
],
Handler => $h, ## A downstream handler
Rules => [
'a' => "Handler",
'b' => $h2, ## Another handler
],
Rules => [
'//node()' => $h,
'b' => undef,
],
Rules => [
'b' => \q{print "got a <b>!\n"},
],
Note: this section describes EventPath and discusses differences between EventPath and XPath. If you are not familiar with XPath you may want to skim those bits; they're provided for the benefit of people coming from an XPath background but hopefully don't hinder others. A working knowledge of SAX is necessary for the advanced bits.
EventPath patterns may match the document, elements, attributes, text nodes, comments, processing instructions, and (not yet implemented) namespace nodes. Patterns like this are referred to as "location paths" and resemble Unix file paths or URIs in appearance and functionality.
Location paths describe a location (or set of locations) in the document
much the same way a filespec describes a location in a filesystem. The
path /a/b/c could refer to a directory named c on a filesystem or
a set of e<ltc>> elements in an XML document. In either case,
the path indicates that c must be a child of b, b must be
<a>'s, and <a> is a root level entity. More examples later.
EventPath patterns may also extract strings, numbers and boolean values
from a document. These are called "expression patterns" and are only
said to match when the values they extract are "true" according to XPath
semantics (XPath truth-ness differs from Perl truth-ness, see
EventPath Truth below). Expression patterns look
like string( /a/b/c ) or number( part-number ), and if the result
is true, the action will be executed and the result can be retrieved
using the xvalue (xvalue) method.
TODO: rename xvalue to be ep_result or something.
We cover patterns in more detail below, starting with some examples.
If you'd like to get some experience with pattern matching in an interactive XPath web site, there's a really good XPath/XSLT based tutorial and lab at http://www.zvon.org/xxl/XPathTutorial/General/examples.html.
Two kinds of actions are supported: Perl subroutine calls and dispatching events to other SAX processors. When a pattern matches, the associated action
This is perhaps best introduced by some examples. Here's a routine that runs a rather knuckleheaded document through a dispatcher:
use XML::SAX::Machines qw( Pipeline );
sub run { Pipeline( shift )->parse_string( <<XML_END ) }
<stooges>
<stooge name="Moe" hairstyle="bowl cut">
<attitude>Bully</attitude>
</stooge>
<stooge name="Shemp" hairstyle="mop">
<attitude>Klutz</attitude>
<stooge name="Larry" hairstyle="bushy">
<attitude>Middleman</attitude>
</stooge>
</stooge>
<stooge name="Curly" hairstyle="bald">
<attitude>Fool</attitude>
<stooge name="Shemp" repeat="yes">
<stooge name="Joe" hairstyle="bald">
<stooge name="Curly Joe" hairstyle="bald" />
</stooge>
</stooge>
</stooge>
</stooges>
XML_END
Let's count the number of stooge characters in that document. To do that, we'd
like a rule that fires on almost all <stooge> elements:
my $count;
run(
XML::Filter::Dispatcher->new(
Rules => [
'stooge' => sub { ++$count },
],
)
);
print "$count\n"; ## 7
Hmmm, that's one too many: it's picking up on Shemp twice since the document
shows that Shemp had two periods of stoogedom. The second node has a
convenient repeat="yes" attribute we can use to ignore the duplicate.
We can ignore the duplicate element by adding a "predicate"
expression to the pattern to accept only those elements with no repeat
attribute. Changing that rule to
'stooge[not(@repeat)]' => ...
or even the more pedantic
'stooge[not(@repeat) or not(@repeat = "yes")]' => ...
yields the expected answer (6).
Now let's try to figure out the hairstyles the stooges wore. To extract just the names of hairstyles, we could do something like:
my %styles;
run(
XML::Filter::Dispatcher->new(
Rules => [
'stooge' => [
'string( @hairstyle )' => sub { $styles{xvalue()} = 1 },
],
],
)
);
print join( ", ", sort keys %styles ), "\n";
which prints "bald, bowl cut, bushy, mop". That rule extracts the text
of each hairstyle attribute and the xvalue() returns it.
The text contents of elements like <attitudes> can also be
sussed out by using a rule like:
'string( attitude )' => sub { $styles{xvalue()} = 1 },
which prints "Bully, Fool, Klutz, Middleman".
Finally, we might want to correlate hairstyles and attitudes by using a rule like:
my %styles;
run(
XML::Filter::Dispatcher->new(
Rules => [
'stooge' => [
'concat(@hairstyle,"=>",attitude)' => sub {
$styles{$1} = $2 if xvalue() =~ /(.+)=>(.+)/;
},
],
],
)
);
print map "$_ => $styles{$_}\n", sort keys %styles;
which prints:
bald => Fool
bowl cut => Bully
bushy => Middleman
mop => Klutz
When a blessed object $handler is provided as an action for a rule:
my $foo = XML::Handler::Foo->new();
my $d = XML::Filter::Dispatcher->new(
Rules => [
'foo' => $handler,
],
Handler => $h,
);
the selected SAX events are sent to $handler.
If the event is selected is a start_document() or start_element()
event and it is selected without using the start-document:: or
start-element:: axes, then the handler ($foo) replaces the
existing handler of the dispatcher ($h) until after the corresponding
end_...() event is received.
This causes the entire element (<foo>) to be sent to the
temporary handler ($foo). In the example, each <foo>
element will be sent to $foo as a separate document, so if
(whitespace shown as underscores)
<root>
____<foo>....</foo>
____<foo>....</foo>
____<foo>....</foo>
</root>
is fed in to $d, then $foo will receive 3 separate
<foo>...</foo>
documents (start_document() and end_document() events are emitted
as necessary) and $h will receive a single document without any
<foo> elements:
<root>
____
____
____
</root>
This can be useful for parsing large XML files in small chunks, often in conjunction with XML::Simple or XML::Filter::XSLT.
But what if you don't want $foo to see three separate documents?
What if you're excerpting chunks of a document to create another
document? This can be done by telling the dispatcher to emit the main
document to $foo and using rules with an action of undef to elide
the events that are not wanted. This setup:
my $foo = XML::Handler::Foo->new();
my $d = XML::Filter::Dispatcher->new(
Rules => [
'/' => $foo,
'bar' => undef,
'foo' => $foo,
],
Handler => $h,
);
, when fed this document:
<root>
__<bar>hork</bar>
__<bar>
__<foo>....</foo>
__<foo>....</foo>
__<foo>....</foo>
__<hmph/>
__</bar>
__<hey/>
</root>
results in $foo receiving a single document of input looking like
this:
<root>
__
__<foo>....</foo>
__<foo>....</foo>
__<foo>....</foo>
__<hey/>
</root>
XML::Filter::Dispatcher keeps track of each handler and sends
start_document() and end_document() at the appropriate times, so
the <foo> elements are "hoisted" out of the <bar>
element in this example without any untoward ..._document() events.
TODO: support forwarding to multiple documents at a time. At the present, using multiple handlers for the same event is not supported.
TODO: At the moment, selecting and forwarding individual events is not supported. When it is, any events other than those covered above will be forwarded individually
XML::Filter::Dispatcher checks when it is first loaded to see if Devel::TraceSAX is loaded. If so, it will emit tracing messages. Typical use looks like
perl -d:Devel::TraceSAX script_using_x_f_dispatcher
If you are use()ing Devel::TraceSAX in source code, make sure that it is
loaded before XML::Filter::Dispatcher.
TODO: Allow tracing to be enabled/disabled independantly of Devel::TraceSAX.
XML::Filter::Dispatcher offers namespace support in matching and by
providing functions like local-name(). If the documents you are
processing don't use namespaces, or you only care about elements and
attributes in the default namespace (ie without a "foo:" namespace
prefix), then you need not worry about engaging
XML::Filter::Dispatcher's namespace support. You do need it if your
patterns contain the foo:* construct (that * is literal).
To specify the namespaces, pass in an option like
Namespaces => {
"" => "uri0", ## Default namespace
prefix1 => "uri1",
prefix2 => "uri2",
},
Then use prefix1: and prefix2: whereever necessary in patterns.
A missing prefix on an element always maps to the default namespace URI, which is "" by default. Attributes are treated likewise, though this is probably a bug.
If your patterns contain prefixes (like the foo: in foo:bar), and
you don't provide a Namespaces option, then the element names will
silently be matched literally as "foo:bar", whether or not the source
document declares namespaces. This may change, as it may cause too
much user confusion.
XML::Filter::Dispatcher follows the XPath specification rather literally
and does not allow :*, which you might think would match all nodes in
the default namespace. To do this, ass a prefixe for the default
namespace URI:
Namespaces => {
"" => "uri0", ## Default namespace
"default" => "uri0", ## Default namespace
prefix1 => "uri1",
prefix2 => "uri2",
},
then use "default:*" to match it.
CURRENT LIMITAION: Currently, all rules must exist in the same namespace context. This will be changed when I need to change it (contact me if you need it changed). The current idear is to allow a special function "Namespaces( { .... }, @rules )" that enables a temporary namespace context, although abbreviated forms might be possible.
"EventPath" patterns are that large subset of XPath patterns that can be run in a SAX environment without a DOM. There are a few crucial differences between the environments that EventPath and XPath each operate in.
XPath operates on a tree of "nodes" where each entity in an XML document
has only one corresponding node. The tree metaphor used in XPath has a
literal representation in memory. For instance, an element
<foo> is represented by a single node which contains other
nodes.
EventPath operates on a series of events instead of a tree of nodes.
For instance elements, which are represented by nodes in DOM trees, are
represented by two event method calls, start_element() and
end_element(). This means that EventPath patterns may match in a
start_...() method or an end_...() method, or even both if you try
hard enough.
The only times an EventPath pattern will match in an
end_...() method are when the pattern refers to an element's contents
or it uses the XXXX function (described below) to do so
intentionally.
The tree metaphor is used to arrange and describe the relationships between events. In the DOM trees an XPath engine operates on, a document or an element is represented by a single entity, called a node. In the event streams that EventPath operates on, documents and element
EventPath is not a standard of any kind, but XPath can't cope with situations where there is no DOM and there are some features that EventPath need (start_element() vs. end_element() processing for example) that are not compatible with XPath.
Some of the features of XPath require that the source document be fully translated in to a DOM tree of nodes before the features can be evaluated. (Nodes are things like elements, attributes, text, comments, processing instructions, namespace mappings etc).
These features are not supported and are not likely to be, you might want to use XML::Filter::XSLT for "full" XPath support (tho it be in an XSLT framework) or wait for XML::TWIG (XML::TWIG) to grow SAX support.
Rather than build a DOM, XML::Filter::Dispatcher only keeps a bare minimum of nodes: the current node and its parent, grandparent, and so on, up to the document ("root") node (basically the /ancestor-or-self:: axis). This is called the "context stack", although you may not need to know that term unless you delve in to the guts.
EventPath borrows a lot from XPath including its notion of truth. This is different from Perl's notion of truth; presumably to make document processing easier. Here's a table that may help, the important differences are towards the end:
Expression EventPath XPath Perl
========== ========= ===== ====
false() FALSE FALSE n/a (not applicable)
true() TRUE TRUE n/a
0 FALSE FALSE FALSE
-0 FALSE** FALSE n/a
NaN FALSE** FALSE n/a (not fully, anyway)
1 TRUE TRUE TRUE
"" FALSE FALSE FALSE
"1" TRUE TRUE TRUE
"0" TRUE TRUE FALSE
* To be regarded as a bug in this implementation
** Only partially implemented/supported in this implementation
Note: it looks like XPath 2.0 is defining a more workable concept
for document processing that uses something resembling Perl's empty
lists, (), to indicate empty values, so "" and () will be
distinct and "0" can be interpreted as false like in Perl. XPath2
is not provided by this module yet and won't be for a long time
(patches welcome ;).
All of this means that only a portion of XPath is available. Luckily, that portion is also quite useful. Here are examples of working XPath expressions, followed by known unimplemented features.
TODO: There is also an extension function available to differentiate between
start_... and end_... events. By default
Expression Event Type Description (event type)
========== ========== ========================
/ start_document Selects the document node
/a start_element Root elt, if it's "<a ...>"
a start_element All "a" elements
b//c start_element All "c" descendants of "b" elt.s
@id start_element All "id" attributes
string( foo ) end_element matches at the first </foo> or <foo/>
in the current element;
xvalue() returns the
text contained in "<foo>...</foo>"
string( @name ) start_element the first "name" attributes;
xvalue() returns the
text of the attribute.
There are several APIs provided: general, xstack, and EventPath variable handling.
The general API provides new() and xvalue(), xvalue_type(), and
xrun_next_action().
The variables API provides xset_var() and xget_var().
The xstack API provides xadd(), xset(), xoverwrite(),
xpush(), xpeek() and xpop().
All of the "xfoo()" APIs may be called as a method or, within rule handlers, called as a function:
$d = XML::Filter::Dispatcher->new(
Rules => [
"/" => sub {
xpush "foo\n";
print xpeek; ## Prints "foo\n"
my $self = shift;
print $self->xpeek; ## Also prints "foo\n"
},
],
);
print $d->xpeek; ## Yup, prints "foo\n" as well.
This dual nature allows you to import the APIs as functions and call them using a concise function-call style, or to leave them as methods and use object-oriented style.
Each call may be imported by name:
use XML::Filter::Dispatcher qw( xpush xpeek );
or by one of three API category tags:
use XML::Filter::Dispatcher ":general"; ## xvalue() use XML::Filter::Dispatcher ":variables"; ## xset_var(), xget_var() use XML::Filter::Dispatcher ":xstack"; ## xpush(), xpop(), and xpeek()
or en mass:
use XML::Filter::Dispatcher ":all";
my $f = XML::Filter::Dispatcher->new(
Rules => [ ## Order is significant
"/foo/bar" => sub {
## Code to execute
},
],
);
Must be called as a method, unlike other API calls provided.
"string( foo )" => sub { my $v = xvalue }, # if imported
"string( foo )" => sub { my $v = shift->xvalue }, # if not
Returns the result of the last EventPath expression evaluated; this is
the result that fired the current rule. The example prints all text
node children of <foo> elements, for instance.
For matching expressions, this is equivalent to $_[1] in action subroutines.
Returns the type of the result returned by xvalue. This is either a SAX event name or "attribute" for path rules ("//a"), or "" (for a string), "HASH" for a hash (note that struct() also returns a hash; these types are Perl data structure types, not EventPath types).
This is the same as xeventtype for all rules that don't evaluate functions like "string()" as their top level expression.
Returns the type of the current SAX event.
Runs the next action for the current node. Ordinarily, XML::Filter::Dispatcher runs only one action per node; this allows an action to call down to the next action.
This is especially useful in filters that tweak a document on the way by. This tweaky sort of filter establishes a default "pass-through" rule and then additional override rules to tweak the values being passed through.
Let's suppose you want to convert some mtimes from seconds since the epoch to a human readable format. Here's a set of rules that might do that:
Rules => [
'node()' => "Handler", ## Pass everything through by default.
'file[@mtime]' => sub { ## intercept and tweak the node.
my $attr = $_[1]->{Attributes}->{"{}mtime"};
## Localize the changes: never assume that it is safe
## to alter SAX elements on the way by in a general purpose
## filter. Some smart aleck might send the same events
## to another filter with a Tee fitting or even back
## through your filter multiple times from a cache.
local $attr->{Value} = localtime $attr->{Value};
## Now that the changes are localised, fall through to
## the default rule.
xrun_next_action;
## We could emit other events here as well, but need not
## in this example.
},
],
EventPath variables may be set in the current context using
xset_var(), and accessed using xget_var(). Variables set in a
given context are visible to all child contexts. If you want a variable
to be set in an enclosed context and later retrieved in an enclosing
context, you must set it in the enclosing context first, then alter it
in the enclosed context, then retrieve it.
EventPath variables are typed.
EventPath variables set in a context are visible within that context and all enclosed contexts, but not outside of them.
"foo" => sub { xset_var( bar => string => "bingo" ) }, # if imported
"foo" => sub { shift->xset_var( bar => boolean => 1 ) },
Sets an XPath variables visible in the current context and all child contexts. Will not be visible in parent contexts or sibling contexts.
Legal types are boolean, number, and string. Node sets and
nodes are unsupported at this time, and "other" types are not useful
unless you work in your own functions that handle them.
Variables are visible as $bar variable references in XPath expressions and
using xget_var in Perl code. Setting a variable to a new value temporarily
overrides any existing value, somewhat like using Perl's local.
"bar" => sub { print xget_var( "bar" ) }, # if imported
"bar" => sub { print shift->xget_var( "bar" ) },
Retrieves a single variable from the current context. This may have been set by a parent or by a previous rule firing on this node, but not by children or preceding siblings.
Returns undef if the variable is not set (or if it was set to undef).
"bar" => sub { print xget_var_type( "bar" ) }, # if imported
"bar" => sub { shift->xget_var_type( "bar" ) },
Retrieves the type of a variable from the current context. This may have been set by a parent or by a previous rule firing on this node, but not by children or preceding siblings.
Returns undef if the variable is not set.
XML::Filter::Dispatcher allows you to register handlers using
set_handler() and get_handler(), and then to refer to them
by name in actions. These are part of the "general API".
You may use any string for handler names that you like, including strings with spaces. It is wise to avoid those standard, rarely used handlers recognized by parsers, such as:
DTDHandler
ContentHandler
DocumentHandler
DeclHandler
ErrorHandler
EntityResolver
LexicalHandler
unless you are using them for the stated purpose. (List taken from XML::SAX::EventMethodMaker).
Handlers may be set in the constructor in two ways: by using a name ending in "Handler" and passing it as a top level option:
my $f = XML::Filter::Dispatcher->new(
Handler => $h,
FooHandler => $foo,
BarHandler => $bar,
Rules => [
...
],
);
Or, for oddly named handlers, by passing them in the Handlers hash:
my $f = XML::Filter::Dispatcher->new(
Handlers => {
Manny => $foo,
Moe => $bar,
Jack => $bat,
},
Rules => [
...
],
);
Once declared in new(), handler names can be used as actions. The "well known" handler name "Handler" need not be predeclared.
For exampled, this forwards all events except the start_element()
and end_element() events for the root element's children, thus
"hoisting" everything two levels below the root up a level:
Rules => [
'/*/*' => undef,
'node()' => "Handler",
],
By default, no events are forwarded to any handlers: you must send individual events to an individual handlers.
Normally, when a handler is used in this manner, XML::Filter::Dispatcher
makes sure to send start_document() and end_document() events to
it just before the first event and just after the last event. This
prevents sending the document events unless a handler actually receives
other events, which is what most people expect (the alternative would be
to preemptively always send a start_document() to all handlers when
when the dispatcher receives its start_document(): ugh).
To disable this for all handlers, pass the SuppressAutoStartDocument
= 1> option.
$self->set_handler( $handler );
$self->set_handler( $name => $handler );
$self->set_handler( $handler );
$self->set_handler( $name => $handler );
The xstack is a stack mechanism provided by XML::Filter::Dispatcher that is automatically unwound after end_element, end_document, and all other events other than start_element or start_document. This sounds limiting, but it's quite useful for building data structures that mimic the structure of the XML input. I've found this to be common when dealing with data structures in XML and a creating nested hierarchies of objects and/or Perl data structures.
Here's an example of how to build and return a graph:
use Graph;
my $d = XML::Filter::Dispatcher->new(
Rules => [
## These two create and, later, return the Graph object.
'graph' => sub { xpush( Graph->new ) },
'end::graph' => \&xpop,
## Every vertex must have a name, so collect in and add it
## to the Graph object using its add_vertex( $name ) method.
'vertex' => [ 'string( @name )' => sub { xadd } ],
## Edges are a little more complex: we need to collect the
## from and to attributes, which we do using a hash, then
## pop the hash and use it to add an edge. You could
## also use a single rule, see below.
'edge' => [ 'string()' => sub { xpush {} } ],
'edge/@*' => [ 'string()' => sub { xset } ],
'end::edge' => sub {
my $edge = xpop;
xpeek->add_edge( @$edge{"from","to"} );
},
],
);
my $graph = QB->new( "graph", <<END_XML )->playback( $d );
<graph>
<vertex name="0" />
<edge from="1" to="2" />
<edge from="2" to="1" />
</graph>
END_XML
print $graph, $graph->is_sparse ? " is sparse!\n" : "\n";
should print "0,1-2,2-1 is sparse!\n".
This is good if you can tell what object to add to the stack before seeing content. Some XML parsing is more general than that: if you see no child elements, you want to create one class to contain just character content, otherwise you want to add a container class to contain the child nodes.
An faster alternative to the 3 edge rules relies on the fact that SAX's start_element events carry the attributes, so you can actually do a single rule instead of the three we show above:
'edge' => sub {
xpeek->add_edge(
$_[1]->{Attributes}->{"{}from"}->{Value},
$_[1]->{Attributes}->{"{}to" }->{Value},
);
},
Push values on to the xstack. These will be removed from the xstack at
the end of the current element. The topmost item on the
xstack is available through the peek method. Elements xpushed before
the first element (usually in the start_document() event) remain on
the stack after the document has been parsed and a call like
my $elt = $dispatcher->xpop;
can be used to retrieve them.
Tries to add a possibly named item to the element on the top of the stack and push the item on to the stack. It makes a guess about how to add items depending on what the current top of the stack is.
xadd $name, $new_item;
does this:
Top of Stack Action
============ ======
scalar xpeek .= $new_item;
SCALAR ref ${xpeek} .= $new_item;
ARRAY ref push @{xpeek()}, $new_item;
HASH ref push @{xpeek->{$name}} = $new_item;
blessed object xpeek->$method( $new_item );
The $method in the last item is one of (in order) "add_$name", "push_$name", or "$name".
After the above action, an
xpush $new_item;
is done.
$name defaults to the LocalName of the current node if it is an attribute or element, so
xadd $foo;
will DWYM. TODO: search up the current node's ancestry for a LocalName when handling other event types.
If no parameters are provided, xvalue is used.
If the stack is empty, it just xpush()es on the stack.
Like xadd(), but tries to set a named value. Dies if the value is
already defined (so duplicate values aren't silently ignored).
xset $name, $new_item;
does this:
Top of Stack Action
============ ======
scalar xpeek = $new_item;
SCALAR ref ${xpeek} = $new_item;
HASH ref xpeek->{$name} = $new_item;
blessed object xpeek->$name( $new_item );
Trying to xset any other types results in an exception.
After the above action (except when the top is a scalar or SCALAR ref), an
xpush $new_item;
is done so that more may be added to the item.
$name defaults to the LocalName of the current node if it is an attribute or element, so
xset $foo;
will DWYM. TODO: search up the current node's ancestry for a LocalName when handling other event types.
If no parameters are provided, xvalue is used.
Exactly like xset but does not complain if the value has already been xadd(), xset() or xoverwrite().
Rules => [
"foo" => sub {
my $elt = $_[1];
xpeek->set_name( $elt->{Attributes}->{"{}name"} );
},
"/end::*" => sub {
my $self = shift;
XXXXXXXXXXXXXXXXXXXX
}
],
Returns the top element on the xstack, which was the last thing pushed in the current context. Throws an exception if the xstack is empty. To check for an empty stack, use eval:
my $stack_not_empty = eval { xpeek };
To peek down the xstack, use a Perlish index value. The most recently pushed element is index number -1:
$xpeek( -1 ); ## Same as $self->peek
The first element pushed on the xstack is element 0:
$xpeek( 0 );
An exception is thrown if the index is off either end of the stack.
my $d = XML::Filter::Dispatcher->new(
Rules => [
....rules to build an object hierarchy...
],
);
my $result = $d->xpop
Removes an element from the xstack and returns it. Usually called in a end_document handler or after the document returns to retrieve a "root" object placed on the stack before the root element was started.
Handy for detecting a nonempty stack:
warn xpeek unless xstack_empty;
Because xpeek and xpop throw exceptions on an empty stack,
xstack_empty is needed to detect whether it's safe to call them.
Handy for walking the stack:
for my $i ( reverse 0 .. xstack_max ) { ## from top to bottom
use BFD;d xpeek( $i );
}
Because xpeek and xpop throw exceptions on an empty stack,
xstack_max may be used to walk the stack safely.
This section assumes familiarity with XPath in order to explain some of the particulars and side effects of the incremental XPath engine.
0, false(), 1, and 'a' have no location
path and apply to all nodes (including namespace nodes and processing
instructions). && or == instead of and or =. characters events will be aggregated as
much as possible, as text() nodes do in XPath. Generally, however,
this is not a problem; instead of writing
"quotation/text()" => sub {
## BUG: may be called several times within each quotation elt.
my $self = shift;
print "He said '", $self->current_node->{Data}, "'\n'";
},
"string( quotation )" => sub {
my $self = shift;
print "He said '", xvalue, "'\n'";
},
<quotation>I am <!-- bs -->GREAT!<!-- bs --></quotation>
.../text() will fire twice, which is not what is needed
here.
string( ... ) will fire once, at the end_element event,
with all descendant text of quotation as the expression result.
GREAT! example will still generate more
than one event due to the comment). start_element. This node type is used by the * node type test.
/a/start-document::* or //start-cdata::*, but it may miss some.
This is meant to help in debugging user code; the eventual goal is to
catch all such nonsense. attribute:: (XPath, attribute) child:: (XPath)
descendant:: (XPath) descendant-or-self:: (XPath) end:: (SAX, end_element)
child::, but selects the end_element event of the
element context node.
end-element:: due to its
brevity.
end-document:: (SAX, end_document)
self::, but selects the end_document event of the document
context node.
end-element:: (SAX, end_element)
end::.
child::, but selects the end_element event of the element
context node. This is like end::, but different from
end-document::.
following:: (XPath, not soon) following-sibling:: (XPath, not soon)
namespace:: (XPath, namespace, todo) parent:: (XPath, todo (will be limited))
preceding:: (XPath, not soon) preceding-sibling:: (XPath, not soon)
self:: (XPath) start:: (SAX, start_element )
start_element events. This is
usually used in preference to start-element:: due to its brevity.
start:: is rarely used to drive code handlers because rules that
match document or element events already only fire code handlers on the
start_element event and not the end_element event (however, when a
SAX handler is used, such expressions send both start and end events to
the downstream handler, so start:: has utility there). start-document:: (SAX, start_document)
start-element::, and is not necessary given start::.
self::, but selects only the start_document events. start-element:: (SAX, start_element)
start::.
child::, but selects only the start_element events.normalize-space() is equivalent to normalize-space(.). string( 10 );
string( /a/b/c );
string( @id );
string() is equivalent to string(.). number( /a/b/c )).
number() is equivalent to number(.). start_element and end_element events and start_document and
end_document events.
is-start-event() or is-end-event() functions in a
predicate the rule may be forced to fire only on end events or on both start
and end events (using a [is-start-event() or is-end-event()] idiom).text() handlers fire once per text node
instead of once per characters() event. add_rule(), remove_rule(), set_rules() methods.Pass Assume_xvalue => 0 flag to tell X::F::D not to support xvalue and xvalue_type, which lets it skip some instructions and run faster.
Pass SortAttributes => 0 flag to prevent calling sort() for each element's attributes (note that Perl changes hashing algorithms occasionally, so setting this to 0 may expose ordering dependancies in your code).
NOTE: this section describes things that may change from version to version as I need different views in to the internals.
Set the option Debug => 1 to see the Perl code for the compiled ruleset. If you have GraphViz.pm and ee installed and working, set Debug => 2 to see a graph diagram of the intermediate tree generated by the compiler.
Set the env. var XFDSHOWBUFFERHIGHWATER=1 to see what events were postponed the most (in terms of how many events had to pile up behind them). This can be of some help if you experience lots of buffering or high latency through the filter. Latency meaning the lag between when an event arrives at this filter and when it is dispatched to its actions. This will only report events that were actually postponed. If you have a 0 latency filter, the report will list no events.
Set the env. var XFDOPTIMIZE=0 to prevent all sorts of optimizations.
perl,
especially across some platforms that it apparently isn't easily supported on. perl,
especially across some platforms that it apparently isn't easily supported on.This is more of a frustration than a limitation, but this class requires that
you pass in a type when setting variables (in the Vars ctor parameter or
when calling xset_var). This is so that the engine can tell what type a
variable is, since string(), number() and boolean() all treat the Perlian 0
differently depending on its type. In Perl the digit 0 means false,
0 or '0', depending on context, but it's a consistent semantic. When
passing a 0 from Perl lands to XPath-land, we need to give it a type so that
string() can, for instance, decide whether to convert it to '0' or
'false'.
...to Kip Hampton, Robin Berjon and Matt Sergeant for sanity checks and to James Clark (of Expat fame) for posting a Yacc XPath grammar where I could snarf it years later and add lots of Perl code to it.
Barrie Slaymaker <barries@slaysys.com>
Copyright 2002, Barrie Slaymaker, All Rights Reserved.
You may use this module under the terms of the Artistic or GNU Pulic licenses your choice. Also, a portion of XML::Filter::Dispatcher::Parser is covered by:
The Parse::Yapp module and its related modules and shell scripts are
copyright (c) 1998-1999 Francois Desarmenien, France. All rights
reserved.
You may use and distribute them under the terms of either the GNU
General Public License or the Artistic License, as specified in the
Perl README file.
Note: Parse::Yapp is only needed if you want to modify lib/XML/Filter/Dispatcher/Grammar.pm
| XML-Filter-Dispatcher documentation | view source | Contained in the XML-Filter-Dispatcher distribution. |