Pod::Abstract::Path - Search for POD nodes matching a path within a


Pod-Abstract documentation Contained in the Pod-Abstract distribution.

Index


Code Index:

NAME

Top

Pod::Abstract::Path - Search for POD nodes matching a path within a document tree.

SYNOPSIS

Top

 /head1(1)/head2          # All head2 elements under 
                          # the 2nd head1 element
 //item                   # All items anywhere
 //item[@label =~ {^\*$}] # All items with '*' labels.
 //head2[/hilight]        # All head2 elements containing
                          # "hilight" elements

 # Top level head1s containing head2s that have headings matching
 # "NAME", and also have at least one list somewhere in their
 # contents.
 /head1[/head2[@heading =~ {NAME}]][//over]

 # Top level headings having the same title as the following heading.
 /head1[@heading = >>@heading]

 # Top level headings containing at least one subheading with the same
 # name.
 /head1[@heading = ./head2@heading]

DESCRIPTION

Top

Pod::Abstract::Path is a path selection syntax that allows fast and easy traversal of Pod::Abstract documents. While it has a simple syntax, there is significant complexity in the queries that you can create.

Not all of the designed features have yet been implemented, but it is currently quite useful, and all of the filters in paf make use of Pod Paths.

SYMBOLS:

/

Selects children of the left hand side.

//

Selects all descendants of the left hand side.

.

Selects the current node - this is a NOP that can be used in expressions.

..

Selects the parrent node. If there are multiple nodes selected, all of their parents will be included.

^

Selects the root node of the tree for the current node. This allows you to escape from a nested expression. Note that this is the ROOT node, not the node that you started from.

If you want to evaluate an expression from a node as though it were the root node, the easiest ways are to detach or dup it - otherwise the root operator will find the original root node.

name, #cut, :text, :verbatim, :paragraph

Any element name, or symbolic type name, will restrict the selection to only elements matching that type. e.g, "//:paragraph" will select all descendants, anywhere, but then restrict that set to only :paragraph type nodes.

Names together separated by spaces will match all of those names - e.g: //head1 over will match all lists and all head1s.

&, | (union and intersection)

Union will take expressions on either side, and return all nodes that are members of either set. Intersection returns nodes that are members of BOTH sets. These can be used to extend expressions, and within [ expressions ] where a path is supported (left side of a match, left or right side of an = sign). These are NOT logical and/or, though a similar effect can be induced through these operators.

@attrname

The named attribute of the nodes on the left hand side. Current attributes are @heading for head1 through head4, and @label for list items.

[ expression ]

Select only the left hand elements that match the expression in the brackets. The expression will be evaluated from the point of view of each node in the current result set.

Expressions can be:

simple: [/head2]

Any regular path will be true if there are any nodes matched. The above example will be true if there are any head2 nodes as direct children of the selected node.

regex match: [@heading =~ {FOO}]

A regex match will be true if the left hand expression has nodes that match the regular expression between the braces on the right hand side. The above example will match anything with a heading containing "FOO".

Optionally, the right hand closing brace may have the i modifier to cause case-insensitive matching. i.e [@heading =~ {foo}i] will match foo or fOO.

complement: [! /head2 ]

Reverses the remainder of the expression. The above example will match anything without a child head2 node.

compare operators: eg. [ /node1 eq /node2 ]

Matches nodes where the operator is satistied for at least one pair of nodes. The right hand expression can be a constant string (single quoted: 'string', or a second expression. If two expressions are used, they are matched combinationally - i.e, all result nodes on the left are matched against all result nodes on the right. Both sides may contain nested expressions.

The following Perl compatible operators are supported:

String: eq gt lt le ge ne

Numeric: == < > <= >= !=

PERFORMANCE

Top

Pod::Abstract::Path is not designed to be fast. It is designed to be expressive and useful, but it involves sucessive expand/de-duplicate/linear search operations and doing this with large documents containing many nodes is not suitable for high performance systems.

Simple expressions can be fast enough, but there is nothing to stop you from writing "//[<condition>]" and linear-searching all 10,000 nodes of your Pod document. Use with caution in interactive systems.

INTERFACE

Top

It is recommended you use the <Pod::Abstract::Node-select>> method to evaluate Path expressions.

If you wish to generate paths for use in other modules, use parse_path to generate a parse tree, pass that as an argument to new, then use process to evaluate the expression against a list of nodes. You can re-use the same parse tree to process multiple lists of nodes in this fashion.

METHODS

Top

filter_unique

It is possible during processing - especially using ^ or .. operators - to generate many duplicate matches of the same nodes. Each pass around the loop, we filter to unique nodes so that duplicates cannot inflate more than one time.

This effectively means that //^ (however awful that is) will match one node only - just really inefficiently.

parse_path

Parse a list of lexemes and generate a driver tree for the process method. This is a simple recursive descent parser with one element of lookahead.

AUTHOR

Top

Ben Lilburne <bnej@mac.com>

COPYRIGHT AND LICENSE

Top


Pod-Abstract documentation Contained in the Pod-Abstract distribution.
package Pod::Abstract::Path;
use strict;
use warnings;

use Data::Dumper;

use Pod::Abstract::BuildNode qw(node);

$Data::Dumper::Indent = 1;

our $VERSION = '0.20';

use constant CHILDREN  => 1;  # /
use constant ALL       => 2;  # //
use constant NAME      => 3;  # head1
use constant INDEX     => 4;  # (3)
use constant L_SELECT  => 5;  # [
use constant ATTR      => 6;  # @label
use constant N_CMP     => 7;  # == != < <= > >=
use constant STRING    => 8;  # 'foobar'
use constant R_SELECT  => 9;  # ]
use constant NUM_OF    => 10; # #
use constant NOT       => 15; # !
use constant PARENT    => 16; # ..
use constant MATCHES   => 17; # =~
use constant REGEXP    => 18; # {<pattern>}
use constant NOP       => 19; # .
use constant PREV      => 20; # <<
use constant NEXT      => 21; # >>
use constant ROOT      => 22; # ^
use constant UNION     => 23; # |
use constant INTERSECT => 24; # &
use constant S_CMP     => 25; # eq lt gt le ge ne

sub new {
    my $class = shift;
    my $expression = shift;
    my $parse_tree = shift;
    
    if($parse_tree) {
        my $self = bless { 
            expression => $expression,
            parse_tree => $parse_tree
        }, $class;
        return $self;
    } else {
        my $self = bless { expression => $expression }, $class;
        
        my @lexemes = $self->lex($expression);
        my $parse_tree = $self->parse_path(\@lexemes);
        $self->{parse_tree} = $parse_tree;
        
        return $self;
    }
}

sub lex {
    my $self = shift;
    my $expression = shift;
    my @l = ( );

    # Digest expression into @l
    while($expression) {
        if($expression =~ m/^\/\//) {
            substr($expression,0,2) = '';
            push @l, [ ALL, undef ];
        } elsif($expression =~ m/^\//) {
            substr($expression,0,1) = '';
            push @l, [ CHILDREN, undef ];
        } elsif($expression =~ m/^\|/) {
            substr($expression,0,1) = '';
            push @l, [ UNION, undef ];
        } elsif($expression =~ m/^\&/) {
            substr($expression,0,1) = '';
            push @l, [ INTERSECT, undef ];
        } elsif($expression =~ m/^\[/) {
            substr($expression,0,1) = '';
            push @l, [ L_SELECT, undef ];
        } elsif($expression =~ m/^\]/) {
            substr($expression,0,1) = '';
            push @l, [ R_SELECT, undef ];
        } elsif($expression =~ m/^(eq|lt|gt|le|ge|ne)/) {
            push @l, [ S_CMP, $1 ];
            substr($expression,0,2) = '';
        } elsif($expression =~ m/^([#_\:a-zA-Z0-9]+)/) {
            push @l, [ NAME, $1 ];
            substr($expression, 0, length $1) = '';
        } elsif($expression =~ m/^\@([a-zA-Z0-9]+)/) {
            push @l, [ ATTR, $1 ];
            substr($expression, 0, length( $1 ) + 1) = '';
        } elsif($expression =~ m/^\(([0-9]+)\)/) {
            push @l, [ INDEX, $1 ];
            substr($expression, 0, length( $1 ) + 2) = '';
        } elsif($expression =~ m/^\{(([^\}]|\\\})+)\}([i]?)/) {
            my $case = $3 eq 'i' ? 0 : 1;
            push @l, [ REGEXP, $1, $case ];
            substr($expression, 0, length( $1 ) + 2 + length($3)) = '';
        } elsif($expression =~ m/^'(([^']|\\')+)'/) {
            push @l, [ STRING, $1 ];
            substr($expression, 0, length( $1 ) + 2) = '';
        } elsif($expression =~ m/^\=\~/) {
            push @l, [ MATCHES, undef ];
            substr($expression, 0, 2) = '';
        } elsif($expression =~ m/^\.\./) {
            push @l, [ PARENT, undef ];
            substr($expression, 0, 2) = '';
        } elsif($expression =~ m/^\^/) {
            push @l, [ ROOT, undef ];
            substr($expression, 0, 1) = '';
        } elsif($expression =~ m/^\./) {
            push @l, [ NOP, undef ];
            substr($expression, 0, 1) = '';
        } elsif($expression =~ m/^\<\</) {
            push @l, [ PREV, undef ];
            substr($expression, 0, 2) = '';
        } elsif($expression =~ m/^\>\>/) {
            push @l, [ NEXT, undef ];
            substr($expression, 0, 2) = '';
        } elsif($expression =~ m/^(==|!=|<=|>=)/) {
            push @l, [ N_CMP, $1 ];
            substr($expression,0,2) = '';
        } elsif($expression =~ m/^(<|>)/) {
            push @l, [ N_CMP, $1 ];
            substr($expression,0,1) = '';
        } elsif($expression =~ m/^\!/) {
            push @l, [ NOT, undef ];
            substr($expression, 0, 1) = '';
        } elsif($expression =~ m/^\%/) {
            push @l, [ NUM_OF, undef ];
            substr($expression, 0, 1) = '';
        } elsif($expression =~ m/^'([\^']*)'/) {
            push @l, [ STRING, $1 ];
            substr($expression, 0, length( $1 ) + 2) = '';
        } elsif($expression =~ m/(\s+)/) {
            # Discard uncaptured whitespace
            substr($expression, 0, length($1)) = '';
        } else {
            die "Invalid token encountered - remaining string is $expression";
        }
    }
    return @l;
}

sub filter_unique {
    my $self = shift;
    my $ilist = shift;
    my $nlist = [ ];
    
    my %seen = ( );
    foreach my $node (@$ilist) {
        push @$nlist, $node unless $seen{$node->serial};
        $seen{$node->serial} = 1;
    }
    
    return $nlist;
}

# Rec descent process of expression.
sub process {
    my $self = shift;
    my @nodes = @_;
    
    my $pt = $self->{parse_tree};
    my $ilist = [ @nodes ];
    
    while($pt && $pt->{action} ne 'end_select') {
        my $action = $pt->{action};
        my @args = ( );
        if($pt->{arguments}) {
            @args = @{$pt->{arguments}};
        }
        if($self->can($action)) {
            $ilist = $self->$action($ilist, @args);
            $ilist = $self->filter_unique($ilist);
        } else {
            warn "discarding '$action', can't do that";
        }
        $pt = $pt->{'next'};
    }
    return @$ilist;
}

sub select_name {
    my $self = shift;
    my $ilist = shift;
    my @names = @_;
    my $nlist = [ ];
    
    my %names = map { $_ => 1 } @names;
    
    for(my $i = 0; $i < @$ilist; $i ++) {
        if($names{$ilist->[$i]->type}) {
            push @$nlist, $ilist->[$i];
        };
    }
    return $nlist;
}

sub select_union {
    my $self = shift;
    my $class = ref $self;

    my $ilist = shift;
    my $left = shift;
    my $right = shift;
    
    my $l_path = $class->new('union left', $left);
    my $r_path = $class->new('union right', $right);
    
    my @l_result = $l_path->process(@$ilist);
    my @r_result = $r_path->process(@$ilist);
    
    return [ @l_result, @r_result ];
}

sub select_intersect {
    my $self = shift;
    my $class = ref $self;
    
    my $ilist = shift;
    my $left = shift;
    my $right = shift;
    
    my $l_path = $class->new("intersect left", $left);
    my $r_path = $class->new("intersect right", $right);
    
    my @l_result = $l_path->process(@$ilist);
    my @r_result = $r_path->process(@$ilist);
    
    my %seen = ( );
    my $nlist = [ ];
    foreach my $a (@l_result) {
        $seen{$a->serial} = 1;
    }
    foreach my $b (@r_result) {
        push @$nlist, $b if $seen{$b->serial};
    }
    
    return $nlist;
}

sub select_attr {
    my $self = shift;
    my $ilist = shift;
    my $name = shift;
    my $nlist = [ ];
    
    foreach my $i (@$ilist) {
        my $pv = $i->param($name);
        if($pv) {
            push @$nlist, $pv;
        }
    }
    return $nlist;
}

sub select_index {
    my $self = shift;
    my $ilist = shift;
    my $index = shift;
    
    if($index < scalar @$ilist) {
        return [ $ilist->[$index] ];
    } else {
        return [ ];
    }
}

sub match_expression {
    my $self = shift;
    my $ilist = shift;
    my $test_action = shift;
    my $invert = shift;
    my $exp = shift;
    my $r_exp = shift;
    
    my $op = shift; # Only for some operators
    
    my $nlist = [ ];
    foreach my $n(@$ilist) {
        my @t_list = $exp->process($n);
        my $t_result;
        # Allow for r_exp to be another expression - generate both
        # node lists if required.
        if( eval { $r_exp->can('process') } ) {
            my @r_list = $r_exp->process($n);
            $t_result = $self->$test_action(\@t_list, \@r_list, $op);
        } else {
            $t_result = $self->$test_action(\@t_list, $r_exp, $op);
        }
        $t_result = !$t_result if $invert;
        if($t_result) {
            push @$nlist, $n;
        }
    }
    return $nlist;
}

sub test_cmp_op {
    my $self = shift;
    my $l_list = shift;
    my $r_exp = shift;
    my $op = shift;
    
    if(scalar(@$r_exp) == 0 || eval { $r_exp->[0]->isa('Pod::Abstract::Node') }) {
        # combination test
        my $match = 0;
        foreach my $l (@$l_list) {
            my $lb = $l->body;
            $lb = $l->pod unless $lb;
            foreach my $r (@$r_exp) {
                my $rb = $r->body;
                $rb = $r->pod unless $rb;
                eval "\$match++ if \$lb $op \$rb";
                die $@ if $@;
            }
        }
        return $match;
    } elsif($r_exp->[0] == STRING) {
        # simple string test
        my $str = $r_exp->[1];
        my $match = 0;
        foreach my $l (@$l_list) {
            my $lb = $l->body;
            $lb = $l->pod unless $lb;
            eval "\$match++ if \$lb $op \$str";
            die $@ if $@;
        }
        return $match;
    } else {
        die "Don't know what to do with ", Dumper([$r_exp]);
    }
}

sub test_regexp {
    my $self = shift;
    my $t_list = shift;
    my $regexp_set = shift;
    my $regexp = $regexp_set->[0];
    my $case = $regexp_set->[1];
    if($case) {
        $regexp = qr/$regexp/;
    } else {
        $regexp = qr/$regexp/i;
    }

    my $match = 0;
    foreach my $t_n (@$t_list) {
        my $body = $t_n->body;
        $body = $t_n->pod unless defined $body;
        if($body =~ $regexp) {
            $match ++;
        }
    }
    return $match;
}

sub test_simple {
    my $self = shift;
    my $t_list = shift;
    
    return (scalar @$t_list) > 0;
}

sub select_children {
    my $self = shift;
    my $ilist = shift;
    my $nlist = [ ];
    
    foreach my $n (@$ilist) {
        my @children = $n->children;
        push @$nlist, @children;
    }
    
    return $nlist;
}

sub select_next {
    my $self = shift;
    my $ilist = shift;
    my $nlist = [ ];
    
    foreach my $n (@$ilist) {
        my $next = $n->next;
        if($next) {
            push @$nlist, $next;
        }
    }
    
    return $nlist;
}

sub select_prev {
    my $self = shift;
    my $ilist = shift;
    my $nlist = [ ];
    
    foreach my $n (@$ilist) {
        my $prev = $n->previous;
        if($prev) {
            push @$nlist, $prev;
        }
    }
    
    return $nlist;
}

sub select_parents {
    my $self = shift;
    my $ilist = shift;
    my $nlist = [ ];
    foreach my $n (@$ilist) {
        if($n->parent) {
            push @$nlist, $n->parent;
        }
    }
    
    return $nlist;
}

sub select_root {
    my $self = shift;
    my $ilist = shift;
    my $nlist = [ ];
    foreach my $n (@$ilist) {
        push @$nlist, $n->root; # almost certainly all the same - not
                                # efficient but consistent.
    }
    
    return $nlist;
}

sub select_current {
    my $self = shift;
    my $ilist = shift;
    return $ilist;
}

sub select_all {
    my $self = shift;
    my $ilist = shift;
    my $nlist = [ ];
    
    foreach my $n (@$ilist) {
        push @$nlist, $self->expand_all($n);
    }
    
    return $nlist;
}

sub expand_all {
    my $self = shift;
    my $n = shift;
    
    my @children = $n->children;
    my @r = ( );
    foreach my $c (@children) {
        push @r, $c;
        push @r, $self->expand_all($c);
    };
    
    return @r;
}

sub parse_path {
    my $self = shift;
    my $l = shift;
    
    my $left = $self->parse_l_path($l);
    
    # Handle UNION or INTERSECT operators
    my $next = shift @$l;
    if($next) {
        my $tok = $next->[0];
        if($tok == UNION) {
            return {
                action => "select_union",
                arguments => [ $left, $self->parse_path($l) ],
            };
        } elsif($tok == INTERSECT) {
            return {
                action => "select_intersect",
                arguments => [ $left, $self->parse_path($l) ],
            }
        } else {
            unshift @$l, $next;
            return $left;
        }
    } else {
        return $left;
    }
}


sub parse_l_path {
    my $self = shift;
    my $l = shift;
    
    my $next = shift @$l;
    my $tok = $next->[0] if $next;
    my $val = $next->[1] if $next;
    
    # Accept: / (children), // (all), name, <select>, @attr, .index
    if(not defined $next) {
        return {
            'action' => 'end_select',
        };
    } elsif(grep { $tok == $_ } 
            (MATCHES, R_SELECT, S_CMP, N_CMP, UNION, INTERSECT)) {
        unshift @$l, $next;
        return {
            'action' => 'end_select',
        };
    } elsif($tok == CHILDREN) {
        return { 
            'action' => 'select_children',
            'next' => $self->parse_l_path($l),
        };
    } elsif($tok == ALL) {
        return {
            'action' => 'select_all',
            'next' => $self->parse_l_path($l),
        };
    } elsif($tok == NEXT) {
        return {
            'action' => 'select_next',
            'next' => $self->parse_l_path($l),
        };
    } elsif($tok == PREV) {
        return {
            'action' => 'select_prev',
            'next' => $self->parse_l_path($l),
        };
    } elsif($tok == PARENT) {
        return {
            'action' => 'select_parents',
            'next' => $self->parse_l_path($l),
        };
    } elsif($tok == ROOT) {
        return {
            'action' => 'select_root',
            'next' => $self->parse_l_path($l),
        };
    } elsif($tok == NOP) {
        return {
            'action' => 'select_current',
            'next' => $self->parse_l_path($l),
        };
    } elsif($tok == NAME) {
        my @extra_names = $self->parse_names($l);
        return {
            'action' => 'select_name',
            'arguments' => [ $val, @extra_names ],
            'next' => $self->parse_l_path($l),
        };
    } elsif($tok == ATTR) {
        return {
            'action' => 'select_attr',
            'arguments' => [ $val ],
            'next' => $self->parse_l_path($l),
        };
    } elsif($tok == INDEX) {
        return {
            'action' => 'select_index',
            'arguments' => [ $val ],
            'next' => $self->parse_l_path($l),
        };
    } elsif($tok == L_SELECT) {
        unshift @$l, $next;
        my $exp = $self->parse_expression($l);
        $exp->{'next'} = $self->parse_l_path($l);
        return $exp;
    } elsif($tok == ATTR) {
        return {
            'action' => 'select_attribute',
            'arguments' => [ $val ],
            'next' => $self->parse_l_path($l),
        }
    } else {
        die "Unexpected token, ", Dumper([$next]);
    }
}

sub parse_names {
    my $self = shift;
    my $l = shift;
    my @r = ( );
    
    # Collect a list of names until there are no more.
    while(@$l && $l->[0][0] == NAME) {
        my $next = shift @$l;
        my $val = $next->[1];
        push @r, $val;
    }
    
    return @r;
}

sub parse_expression {
    my $self = shift;
    my $class = ref $self;
    my $l = shift;
    
    my $l_select = shift @$l;
    die "Expected L_SELECT, got ", Dumper([$l_select])
        unless $l_select->[0] == L_SELECT;
    
    # See if we lead with a NOT
    if($l->[0][0] == NOT) {
        shift @$l;
        unshift @$l, $l_select;
        
        my $exp = $self->parse_expression($l);
        $exp->{arguments}[1] = !$exp->{arguments}[1];
        return $exp;
    }
    
    my $l_exp = $self->parse_path($l);
    $l_exp = $class->new("select expression",$l_exp);
    my $op = shift @$l;
    my $op_tok = $op->[0];
    my $op_val = $op->[1];
    my $exp = undef;
    
    if($op_tok == MATCHES) {
        my $re = shift @$l;
        my $re_tok = $re->[0];
        my $re_str = $re->[1];
        my $case_sensitive = $re->[2];
        
        if($re_tok == REGEXP) {
            $exp = {
                'action' => 'match_expression',
                'arguments' => [ 'test_regexp', 0, 
                                 $l_exp, 
                                 [ $re_str, $case_sensitive ] ],
            }
        } else {
            die "Expected REGEXP, got ", Dumper([$re_tok]);
        }
    } elsif($op_tok == S_CMP || $op_tok == N_CMP) {
        my $rh = shift @$l;
        my $rh_tok = $rh->[0];
        my $r_exp = undef;
        
        if($rh_tok == STRING) { # simple string equality
            $r_exp = $rh;
        } else {
            unshift @$l, $rh;
            $r_exp = $self->parse_path($l);
            $r_exp = $class->new("select expression",$r_exp);
        }
        $exp = {
            action => 'match_expression',
            arguments => [ 'test_cmp_op', 0,
                           $l_exp, $r_exp, $op_val ],
        };
    } elsif($op_tok == R_SELECT) {
        # simple expression
        unshift @$l, $op;
        $exp = {
            'action' => 'match_expression',
            'arguments' => [ 'test_simple', 0, $l_exp ],
        }
    } else {
        die "Expected MATCHES, got ", Dumper([$op_tok]);
    }
    
    # Must match close of select;
    my $r_select = shift @$l;
    die "Expected R_SELECT, got, ", Dumper([$r_select])
        unless $r_select->[0] == R_SELECT;
    die "Failed to generate expression"
        unless $exp;
    
    # All OK!
    return $exp;
}

1;