Regexp::NamedCaptures - Saves capture results to your own variables


Regexp-NamedCaptures documentation Contained in the Regexp-NamedCaptures distribution.

Index


Code Index:

NAME

Top

Regexp::NamedCaptures - Saves capture results to your own variables

VERSION

Top

Version 0.04

SYNOPSIS

Top

 use Regexp::NamedCaptures;
 my ( $name, $title, $first, $last );
 /(?<\$name>(?<\$title>Mr\.|Ms\.) (?<\$first>\w+) (?<\$last>\w+))/;

 # is the same as

 my ( $name, $title, $first, $last )
     = /((Mr\.|Ms\.) (\w+) (\w+))/;

 # use re 'eval' when interpolating
 use Regexp::NamedCaptures;
 use re 'eval';
 /(?<\$something>$pattern)/




DESCRIPTION

Top

This highly experimental module implements named captures for perl-5.008. Perl-5.10+ has built-in named captures and you should not attempt to use this module.

When your regular expression captures something, you can have it automatically copied out to the right location. This is an improvement over normal perl because now you don't have to deal with positional captures. When your expression is complex and there are multiple or nested captures it really helps to not have to track what number you're supposed to find your data in.

NAMED CAPTURE SYNTAX

Top

I have borrowed the syntax from .Net. I'm told that each of the following forms are equivalent so I've treated them identically.

 (?< name >pattern)
 (?' name 'pattern)

name should be a a piece of valid perl code. In a normal, interpolating regular expression, you would write (?<\$something...) if you wanted to have the result copied to the $something variable. That is, perl will interpolate your variables just like it always does.

The value of name may be arbitrary perl code. It must be a valid lvalue.

pattern is a normal pattern.

The entire expression is rewritten as:

 (pattern)(?{ name = $^N })

FUNCTIONS

Top

$rewritten_regexp = convert( $original_regexp )

This function does all the work of converting a regular expression containing named capture expressions into an expression that can be used by perl. You only need this if you're going to be creating regular expressions at runtime.

 use re 'eval';
 $re = Regexp::NamedCapture::convert '(?<$var>...)'
 $re = qr/$re/

 use re 'eval';
 $re = Regexp::NamedCapture::convert "(?'\$var'...)";

use re 'eval' AND SECURITY

Top

This module functions by inserting (?{ code }) blocks into your expression. As a security feature, perl does not allow new (?{ ... }) blocks to be compiled once BEGIN-time has passed unless the programmer specifically lifts that restriction by including the use re 'eval' pragma.

If you trust all of the expressions that you're interpolating, you can use this safely. If you are accepting regular expressions from sources you might not trust, you should not use use re 'eval'.

If you still want to use this module, see if you can push your regular expression compilation earlier.

Consider these two examples:

 use re 'eval';
 $rx = qr/(?<\$name>$expr)/;

 BEGIN {
     $rx = qr/(?<\$name>$expr)/;
 }

The first one requires the use re 'eval' pragma because the interpolation and compilation occurs at runtime. The second does not because it interpolated and compiled the pattern at BEGIN-time. It suffers the obvious drawback that you must have the value for $expr at BEGIN-time instead of runtime.

AUTHOR

Top

"Joshua ben Jore" <jjore@cpan.org>

BUGS

Top

\Q escapes are completed ignored. If you try to use one to prevent something that looks like a named capture from being parsed as one, it won't work.

Please report any bugs or feature requests to bug-regexp-namedcaptures@rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Regexp-NamedCaptures. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

ACKNOWLEDGEMENTS

Top

Jeffrey Friedl's book Mastering Regular Expressions for the original inspiration. perlre for making it possible. Minneapolis.pm for giving me a reason to create this. Tanktalus, Ctrl-z, and others of perlmonks.org

COPYRIGHT & LICENSE

Top


Regexp-NamedCaptures documentation Contained in the Regexp-NamedCaptures distribution.
package Regexp::NamedCaptures;

use warnings;
use strict;

# $^N didn't appear in perlvar til then.
use 5.007_01;

use Text::Balanced qw( extract_bracketed extract_quotelike );
use Carp qw( croak carp );

# Attempt to load a real Params::Validate or just create a fake one if
# the user doesn't have one.`
BEGIN {
    eval {
        require Params::Validate;
        Params::Validate->import( 'validate_pos', 'SCALAR', 'UNDEF',
            'CODEREF' );
    };

    if ($@) {
        eval(     'sub validate_pos (\@@) { @{$_[0]} }'
                . 'sub SCALAR () { 0 }'
                . 'sub UNDEF () { 0 }'
                . 'sub CODEREF () { 0 }' );
    }
}

# Predeclare these so I can call them without needing parentheses and
# so perl will help me notice if I've mispelled them at *compile*
# time.
sub convert;
sub _convert_foo_expr;
sub _convert_chevron_expr;
sub _convert_quote_expr;

our $VERSION = '0.05';

sub convert {
    my ($in) = validate_pos( @_, { type => SCALAR | UNDEF } );

    if ( not defined $in ) {
        $in = '';
        carp "Use of uninitialized value in regexp compilation";
    }

    my @targets;
    my $out = '';
    while ( length $in ) {

        # Seek $in forward until a (?< or (?' is found. Be sure to
        # exclude (?<! and (?<= because they are normal regexp
        # patterns.
        if ( $in !~ /\((?:(?=\?<[^!=])|(?=\?\'))/ ) {

            # Nothing was found - copy the rest of $in to $out and
            # empty $in.
            $out .= $in;
            $in = '';
        }
        else {

            # Copy any leading text directly to the output.
            $out .= substr $in, 0, $-[0], '';

            my $expr;
            ( $expr, $in ) = extract_bracketed $in, '()';

            my $target;
            ( $target, $expr ) = (
                  '(?<' eq substr( $expr, 0, 3 ) ? _convert_chevron_expr $expr
                : '(?\'' eq substr( $expr, 0, 3 ) ? _convert_quote_expr $expr
                : croak "Invalid escape sequence in $expr."
            );
            $out .= $expr;

            push @targets, $target;
        }
    }

    if (@targets) {

        # Prepend target clearing code.
        $out = "(?{" . join( '=', @targets ) . "=undef})$out";
    }
    return $out;
}

sub _convert_quote_expr {
    my ($re) = validate_pos(
        @_,
        {   type  => SCALAR,
            regex => qr<\A\(\?\'.+\'.*\)\z>s
        }
    );
    _convert_foo_expr sub {&extract_quotelike}, $re;
}

sub _convert_chevron_expr {
    my ($re) = validate_pos(
        @_,
        {   type  => SCALAR,
            regex => qr<\A\(\?\<.+\>.*\)\z>s
        }
    );
    _convert_foo_expr sub { extract_bracketed shift, '<>' }, $re;
}

sub _convert_foo_expr {
    my ( $extract, $in ) = validate_pos(
        @_,
        { type => CODEREF },
        {   type  => SCALAR,
            regex => qr<^\(\?..+..*\)$>s
        }
    );

    # Zap the (? and ) parts of (?...) away.
    substr $in, 0,  2, '';
    substr $in, -1, 1, '';

    # Split the _NAME_ part from the EXPR part of _NAME_EXPR
    my ( $name, $expr ) = $extract->($in);

    # Possibly transform the contents of $expr if it contained some
    # (?<...>...) expressions.
    $expr = convert $expr;

    # Zap the delimiters on _NAME_
    substr $name, 0,  1, '';
    substr $name, -1, 1, '';

    # Unescape stuff in $name
    $name =~ s/\\(.)/$1/gs;

    # Rewrite the expression so it's a plain capture followed by a
    # code block.
    return ( $name => "($expr)(?{$name=\$^N})" );
}

#####################################################################
#####################################################################

# Overload magic follows

use overload(
    '.'  => \&_concat,
    '""' => \&_finalize
);

sub import {

    # Constants are overloaded so their fragments are passed to
    # _postpone().
    overload::constant 'qr' => \&_postpone;
}

sub _postpone {

    # _postpone returns an object.
    my ($re) = @_;

    # If I was given an undef, pass the error back to the right
    # place. Without this, the user is going to get an error about an
    # undefined value in *my* code. Blech.
    if ( not defined $re ) {
        carp "Use of uninitialized value in regexp compilation";
        $re = '';
    }
    return bless \$re, __PACKAGE__;
}

sub _concat {

    # _concat happens anytime something is interpolated. It
    # re-postpones things until later.

    my ( $left, $right, $inverted ) = @_;
    ( $left, $right ) = ( $right, $left ) if $inverted;

    for my $tgt ( $left, $right ) {
        $tgt = $$tgt if ref($tgt) eq __PACKAGE__;

        # As in _postpone, I want to pass this warning off as my
        # caller's problem and not a problem with
        # Regexp::NamedCaptures.
        if ( not defined $tgt ) {
            carp "Use of uninitialized value in concatenation (.) or string";
            $tgt = '';
        }
    }

    my $re = "$left$right";
    return bless \$re, __PACKAGE__;
}

sub _finalize {

    # _finalize happens when the regex is due to be compiled. Here, I
    # just rethrow the regex to the user-accessible function
    # convert().

    return convert ${ $_[0] };
}

"Read more smut.";