Code::Splice - Injects the contents of one subroutine at a specified point elsewhere.


Code-Splice documentation Contained in the Code-Splice distribution.

Index


Code Index:

NAME

Top

Code::Splice - Injects the contents of one subroutine at a specified point elsewhere.

SYNOPSIS

Top

  use Code::Splice;

  Code::Splice::inject(
    code => sub { print "fred\n"; }, 
    package => 'main', 
    method => 'foo', 
    precondition => sub { 
      my $op = shift; 
      my $line = shift;
      $line =~ m/print/ and $line =~ m/four/;
    },
    postcondition => sub { 
      my $op = shift; 
      my $line = shift;
      $line =~ m/print/ and $line =~ m/five/;
    },
  );

  sub foo {
    print "one\n";
    print "two\n";
    print "three\n";
    print "four\n";
    print "five\n";
  }

DESCRIPTION

Top

Removes the contents of a subroutine (usually an anonymous subroutine created just for the purpose) and splices in into the program elsewhere.

Why, you ask?

Write stronger unit tests than the granularity of the API would otherwise allow
Write unit tests for nasty, interdependant speghetti code (my motivation -- hey, you gotta have tests before you can start refactoring, and if you can't write tests for the code, you're screwed)
Fix stupid bugs and remove stupid restrictions in other people's code in a way that's more resiliant across upgrades than editing files you don't own
Be what "aspects" should be
Screw with your cow-orkers by introducing monster heisenbugs
Play with self-modifying code
Write self-replicating code (but be nice, we're all friends here, right?)

The specifics:

The body of the code { } block are extracted from the subroutine and inserted in a place in the code specified by the call to the splice() function. Where the new code is spliced in, the old code is spliced out. The package and method arguments are required and tell the thing how to find the code to be modified. The code argument is required as it specifies the code to be spliced in. That same code block should not be used for anything else under penalty of coredump.

The rest of the argumets specify where the code is to be inserted. Any number of precondition and postcondition arguments provide callbacks to help locate the exact area to splice the code in at. Before the code can e spliced in, all of the precondition blocks must have returned true, and none of the postcondition blocks may have yet returned true. If a postcondition returns true before all of the precondition blocks have, an error is raised. Both blocks get called numerous times per line and get passed a reference to the B OP object currently under consideration and the text of the current line:

    precondition => sub { 
      my $op = shift; 
      my $line = shift;
      $line =~ m/print/ and $line =~ m/four/;
    },

... or...

    precondition => sub { my $op = shift; $op->name eq 'padsv' and $op->sv->sv =~ m/fred/; },

It's possible to insert code in the middle of an expression when testing ops, but when testing the text of the line of code, the spliced in code will always replace the whole line.

I'll probably drop sending in the opcode in a future version, at least for the precondition/postcondition blocks, or maybe I'll swap them to the 2nd arg so they're more optional.

Do not attempt to match text in comments as it won't be there. The code in $line is re-generated from the bytecode using B::Deparse and will vary from the original source code in a few ways, including changes to formatting, changes to some idioms and details of the expressions, and formatting of the code with regards to whitespace.

The splicing code will die if it fails for any reason. This will likely change in possible future versions.

There are also label and line arguments that create preconditions for you, for simple cases. Of course, you shouldn't use line for anything other than simple experimentation.

References to lexical variables in the code to be injected are replaced with references to the lexical variables of the same name in the location the code is inserted into. If a variable of the same name doesn't exist there, it's an error. ... but it probably shouldn't be an error, at least in the cases where the code being spliced in declares that lexical with my, or when the variable was initiailized entirely outside of the sub block being spliced in and was merely closed over by it.

See the comments in the source code (at the top, in a nice block) for my todo/desired features. Let me know if there are any features in there or yet unsuggested that you want. I won't promise them, but I would like to hear about them.

BUGS

Top

The original code reference passed in cannot be used elsewhere. It can't be called, and it should not be passed back to inject() again. Failure to heed these warnings will result in coredumps and strange behaviors.

Until I get around to finishing reworking B::Generate, B::Generate-1.06 needs line 940 of B-Generate-1.06/lib/B/Generate.c changed to read o = Perl_fold_constants(o); (the word Perl and an understore should be inserted). This is in order to build B::Generate-1.06 on newer Perls. I have a fixed and slightly extended version in my area on CPAN, if you search for SWALTERS.

Should gracefully default to not fixing up lexicals where no direct equivilent exists.

Should repair the provided subroutine reference so that if were to be accidentally called, Perl wouldn't coredump.

HISTORY

Top

0.1 -- initial release.

SEE ALSO

Top

http://search.cpan.org/~swalters/B-Generate-1.06_1/ -- slightly updated B::Generate -- you'll need this

http://perldesignpatterns.com/?PerlAssembly attempts to document the Perl internals I'm prodding so bluntly.

AUTHORS

Top

Scott Walters scott@slowass.net - http://slowass.net/

Brock Wilcox awwaiid@thelackthereof.org - http://thelackthereof.org/

Code lifted from various B modules...

COPYRIGHT AND LICENSE

Top


Code-Splice documentation Contained in the Code-Splice distribution.

package Code::Splice;

use 5.008;
use strict;
use warnings;

our $VERSION = '0.01';

<<comment;

Todo:

* Change the nextstate instructions in the code as we paste it:
  Line number should be where it's inserted at, but filename should have info about the code
  having been spliced.
* Option about whether to splice out the matching op or append/prepend to it.
* Option about whether to splice into an expression or to splice only at a nextstate/at the line level.
* Positional argument syntax, where arguments to the code being replced can be re-spliced into the
  user provided code (needed to do real macroy stuff)
* Feature where certain subroutine names (or subroutines tagged with a certain attribute)
  get replaced with their definitions at each point they appear, with their arguments spliced in

comment

use B qw< OPf_KIDS OPf_STACKED OPf_WANT OPf_WANT_VOID OPf_WANT_SCALAR OPf_WANT_LIST OPf_REF OPf_MOD OPf_SPECIAL OPf_KIDS >;
use B qw< OPpTARGET_MY ppname>; 
use B qw< SVf_IOK SVf_NOK SVf_POK SVf_IVisUV >;
use B::Generate;
use B::Concise;
use B::Deparse;
# use B::Utils;
sub SVs_PADMY () { 0x00000400 }     # use B qw< SVs_PADMY >;

use strict;
use warnings;

sub OPfDEREF    () { 32|64 } # #define OPpDEREF                (32|64) /* autovivify: Want ref to something: */
sub OPfDEREF_AV () { 32 }    # #define OPpDEREF_AV             32      /*   Want ref to AV. */
sub OPfDEREF_HV () { 64 }    # #define OPpDEREF_HV             64      /*   Want ref to HV. */
sub OPfDEREF_SV () { 32|64 } # #define OPpDEREF_SV             (32|64) /*   Want ref to SV. */

#
# debugging
#

my $debug;
# use Data::Dumper 'Dumper'; # debug
# use Carp 'confess';
# BEGIN { $SIG{USR1} = sub { use Carp; print confess("crap."); exit; }; };

#
# api
#

sub inject {

    my %args = @_;
    my $code = delete $args{code};          # what to insert
    my $package = delete $args{package};    # where to insert it
    my $method = delete $args{method};

    # user-provided arrays-of-code specifications of where to inject at

    my $preconditions = delete $args{preconditions} || [ ];
    my $postconditions = delete $args{postconditions} || [ ];

    for(my $i = 0; $i < @_; $i += 2) { 
        $_[$i] eq 'precondition' and push @$preconditions, $_[$i+1];
        $_[$i] eq 'postcondition' and push @$postconditions, $_[$i+1];
    }

    delete $args{precondition};
    delete $args{postcondition};

    # specifications with which to build 

    my $line = delete $args{line};
    my $label = delete $args{label};

    $debug = delete $args{debug};

    %args and die "unknown arguments: " . join ', ', keys %args;

    UNIVERSAL::isa($code, 'CODE') or die;

    # Build list of conditions that must be true for the injection and list of things which cannot be true

    $line and push @$preconditions, sub {
        my $op = shift;
        $op->name eq 'nextstate' or return;
        $line and $op->line == $line or return;
        return 1;
    };

    $label and push @$preconditions, sub {
        my $op = shift;
        $op->name eq 'nextstate' or return;
        $line and $op->label eq $label or return;
        return 1;
    };

    # Look up the method we're supposed to insert into

    my $cv = do { no strict 'refs'; B::svref_2object(*{$package.'::'.$method}{CODE} or die "no such package/method"); }; 
    $cv->ROOT() or die "no code in $package\::method";
    $cv->STASH()->isa('B::SPECIAL') and die; # Can't locate object method "NAME" via package "B::SPECIAL"
    $cv->ROOT()->can('first') or die "$package\::$method cannot do ->ROOT->first\n"; 

    # Ready the code we're support to inject

    # Code we're to insert should have a structure as follows:

    # 5  <1> leavesub[1 ref] K/REFC,1 ->(end)
    # -     <@> lineseq KP ->5
    # 1        <;> nextstate(splice 38 splice.pm:99) v/2 ->2
    # 4        <@> print sK ->5
    # 2           <0> pushmark s ->3
    # 3           <$> const(PV "test!!\n") s ->4

    # We want the nextstate and all of its siblings (print, another nextstate perhaps, more stuff...)

    my $newcv = B::svref_2object($code);
    my $newop = $newcv->ROOT;                            # $newop points to a leavesub instruction
    $newop->name eq 'leavesub' or die;
    my $newopfirst = $newop->first->first;  $newopfirst = $newopfirst->has_sibling if $newopfirst->has_sibling and $newopfirst->name  eq 'nextstate'; # was causing coredumps when the nextstate was inserted into the wrong place
    my $newoplast = do { my $x = $newopfirst; $x = $x->has_sibling while $x->has_sibling; $x; };

    # Get ready to recurse through the bytecode tree - build a reverse index, previous, from the next
    # links and do any debugging output after we traverse the tree

    my @nonrootpad = ($cv->PADLIST->ARRAY)[0]->ARRAY;

    my $redo_reverse_indices = sub {
        my $siblings = { };
        walkoptree_slow($cv->ROOT, sub { 
            my $self = shift;       return unless $self and $$self;
            my $next = $self->next; 
            my $sibl = $self->can('sibling') ? $self->sibling : undef;
            $siblings->{$$sibl} = $self if $sibl and $$sibl;
        });
        return $siblings;
    };

    my $siblings = $redo_reverse_indices->();

    # build a table of deparsed code to line number

    my @codelines;
    
    walkoptree_slow($cv->ROOT, sub {

        my $op = shift;
        return if $op->isa('B::NULL');
        return unless $op->name eq  'nextstate';

        my $line = $op->line or die;
        $op = $op->sibling;
        return if $op->isa('B::NULL');

        my $dp = B::Deparse->new;
        $dp->{curcv} = $cv;
        $debug and print "debug: deparse: $line: ", $dp->deparse($op, 0), "\n";
        $codelines[$line] = $dp->deparse($op, 0);

    });

    # debugging for before we modify anything

    $debug and do { print "\n\nbefore:\n"; B::Concise::concise_cv_obj('basic', $cv); }; # dump the opcode tree of this code value

    # identify the pointcut and insert the target code in right there

    my $curcop;
    my $codeline;

    my $look_for_things_to_diddle = sub {
     
        my $op = shift or die;       # op object
        my $level = shift;
        my $parents = shift or die;
    
        return unless $op and $$op;
        return if $op->isa('B::NULL');

        $debug and print "debug: look_for_things_to_diddle: doing an ", $op->name, "\n";
    
        return unless exists $parents->[0]; # root op isn't that interesting and we need a parent
        my $parent = $parents->[-1];
    
        my $pointcut = sub {
    
            # When splicing bytecode, we must consider: parent's first, parent's last, our previous sibling, our next sibling
            # print "modifying ", $op->name, " at addresss ", $$op, "\n";
    
            # XXX alternate between the two according to some test
            # XXX rewrite the graphed in code so that targs refer to lexicals in the scope of the CV they're being spliced into
    
            my $prev_sibling = $siblings->{$$op}; # may be undef
            my $next_sibling = $op->sibling;      # may be undef
    
            $prev_sibling->sibling($newopfirst) if $prev_sibling and $$prev_sibling;
            $newoplast->sibling($op->sibling) if $op->sibling and ${$op->sibling};
    
            $debug and print "debug: splicing code, I think the parent is a ", $parent->name, "\n";
    
            $parent->first($newopfirst) if $parent->can('first') and ${$parent->first} == $$op;
            $parent->last($newoplast) if $parent->can('last') and ${$parent->last} == $$op;
    
            $siblings = $redo_reverse_indices->(); # only one swath of code is injected at a time, so this isn't currently needed

            # One chunk of bytecode can only be spliced into one place unless we make a deep copy of it,
            # which we don't know how to do yet, so we just bail.  

            goto did_pointcut;
    
        };

        $curcop = $op if $op->name eq 'nextstate';
        $codeline = $codelines[$curcop->line] if $curcop and defined $codelines[$curcop->line];
 
        for my $post (@$postconditions) {
            if($post->($op, $codeline)) {
                die "post condition true before insert point found: ". B::Deparse->new->coderef2text($post);
            }
        }

        for my $i (0 .. @$preconditions-1) {
            if($preconditions->[$i]->($op, $codeline)) {
                splice @$preconditions, $i, 1, ();
            }
        }
    
        if(! @$preconditions) {
            $op = $op->has_sibling if $op->has_sibling and $op->name eq 'nextstate';
            $pointcut->();
            goto did_pointcut;
        }
    
        return;
    
    };

    walkoptree_slow($cv->ROOT, $look_for_things_to_diddle);
    die "pointcut failed";
    did_pointcut:

    fix($cv->ROOT->first, $cv->ROOT);

    $debug and do { print "\n\nafter:\n"; B::Concise::concise_cv_obj('basic', $cv); }; # dump the opcode tree of this code value

    # Translate the spliced-in code's idea of lexicals to match where it's spliced in to

    my @srcpad = lexicals($newcv);
    my @destpad = lexicals($cv); 

    my %destpad = map { ( $destpad[$_] => $_ ) } grep defined $destpad[$_], 0 .. $#destpad; # build a name-to-number index

    # map { ( $_ => $padnames[$_]->PVX) }  grep { ! $padnames[$_]->isa('B::SPECIAL') } 0 .. $#padnames;

    $debug and do { print "debug: srcpad: ", join ', ', map $_||'(undef)', @srcpad; print "\n"; };
    $debug and do { print "debug: destpad: ", join ', ', map $_||'(undef)', @destpad; print "\n"; };

    walkoptree_slow($cv->ROOT, sub {
        my $op = shift or die;       # op object
        $op->can('targ') or return;  # B::NULL cannot
        $srcpad[$op->targ] or return;
        $debug and print "debug: ", $op->name, " references pad slot ", $op->targ, " which contains ", $srcpad[$op->targ]||'', "\n";
        exists $destpad{$srcpad[$op->targ]} or die "variable ``$srcpad[$op->targ]'' doesn't exist in target context";
        $op->targ($destpad{$srcpad[$op->targ]});
        # print "debug: variable name: $srcpad[$op->targ]\n";
        # print "debug: index of same variable in dest: ", $destpad{$srcpad{$op->targ}}, "\n";
    });

    return 1;
}


#
# utility methods
#

my @parents = ();

sub walkoptree_slow {
    # actually recurse the bytecode tree
    # stolen from B.pm, modified
    my $op = shift;
    my $sub = shift;
    my $level = shift;
    $level ||= 0;
    # warn(sprintf("walkoptree: %d. %s\n", $level, peekop($op))) if $debug;
    $sub->($op, $level, \@parents);
    if ($op->can('flags') and $op->flags() & OPf_KIDS) {
        # print "debug: go: ", '  ' x $level, $op->name(), "\n"; # debug
        push @parents, $op;
        my $kid = $op->first();
        my $next;
        next_kid:
          # was being changed right out from under us, so pre-compute
          $next = 0; $next = $kid->sibling() if $$kid;
          walkoptree_slow($kid, $sub, $level + 1);
          $kid = $next;
          goto next_kid if $kid;
        pop @parents;
    }
    if (B::class($op) eq 'PMOP' && $op->pmreplroot() && ${$op->pmreplroot()}) {
        # pattern-match operators
        push @parents, $op;
        walkoptree_slow($op->pmreplroot(), $sub, $level + 1);
        pop @parents;
    }
};

sub fix {
    my ($op, $parent) = @_;
    $debug and print "fixing: ", $$op ? $op->name : '(null)', "\n";
    if($op->isa('B::NULL')) {
        $debug and print "skipping null\n";
        #return fix($op->first, $parent);
        return $op;
    }
    # $op = denull($op);
    if($op->has_sibling) {
        $debug and print "has sibling, fixing and hooking\n";
        $op->next(fix($op->has_sibling, $parent));
    } else {
        $debug and print "no sibling, hooking to parent (if applicable)\n";
        $op->next($parent) if $parent;
    }
    if($op->has_first) {
        $debug and print "Fixing children, and getting lastmost first\n";
        return fix($op->has_first, $op);
    } else {
        $debug and print "No kids... we are the lastmost first!\n";
        return $op;
    }
}

sub B::OP::has_sibling {
    my $op = shift;
    # eval { warn 'has_sibling: ' . $op->sibling; };
    return unless $op->can('sibling') and $op->sibling and ${$op->sibling}; #  and ref $op->sibling ne 'B::NULL';
    return denull($op->sibling);
}

sub B::OP::has_first {
    my $op = shift;
    # eval { warn 'has_first: ' . $op->first; };
    return unless $op->can('first') and $op->first and ${$op->first}; #  and ref $op->first ne 'B::NULL';
    return denull($op->first);
}

sub denull {
    my $op = shift;
    if( $op->isa('B::NULL') ) {
        return denull($op->first);
    } else {
        return $op;
    }
}

sub lexicals {
    my $cv = shift;
    # map { ( $_ => $padnames[$_]->PVX) }  grep { ! $padnames[$_]->isa('B::SPECIAL') } 0 .. $#padnames;
    map { $_->isa('B::SPECIAL') ? undef : $_->PVX } ($cv->PADLIST->ARRAY)[0]->ARRAY;
}

1;


__END__

        #for my $i ( 0 .. $#padnames ) {
        #    $padnames[$i]->isa('B::SPECIAL') and next;
        #    use Data::Dumper; print $i, ' ', $padnames[$i]->PVX, "\n"; # Dumper $padnames[0];
        #}
    # B::main_root()->linklist();
        # walk_topdown($cv->ROOT, sub { $_[0]->concise($_[1]) }, 0); # in B::Concise
        # goto not_pointcut unless $op->name() eq 'padav' or $op->name() eq 'padhv';
        # goto not_pointcut unless OPf_WANT_SCALAR == ($op->flags() & OPf_WANT);
        # goto not_pointcut if $op->flags & OPf_REF; # things like 'exists' want a ref
        # XXX programmable selector
    # our $testcv = B::svref_2object(sub { print "test!!\n"; });
    # B::Concise::concise_cv_obj('basic', $testcv); # dump the opcode tree of this code value
    # $newop = $newop->first->first->sibling if $newop->name eq 'leavesub'; # seek to the print
    my %srcpad = do {
        my @padnames = ($cv->PADLIST->ARRAY)[0]->ARRAY;
        map { ( $_ => $padnames[$_]->PVX) }  grep { ! $padnames[$_]->isa('B::SPECIAL') } 0 .. $#padnames;
    };
    # map { ( $_ => $padnames[$_]->PVX) }  grep { $padnames[$_]->can('FLAGS') and $padnames[$_]->FLAGS & SVs_PADMY } 0 .. $#padnames;
   # B::Concise::concise_cv_obj(0, $cv); # just to register the cv so when it goes to pick things out of the pad, it can
   # B::Concise::walk_topdown($op, sub { $_[0]->concise($_[1]) }, 0); 
                print $dp->indent($dp->deparse($op, 0));

        my $stringrep = do {
            my $cachedstringrep;
            sub {
                again: $cachedstringrep and return $cachedstringrep;
                my $leave = B::LISTOP->new('leave', OPf_WANT_LIST | OPf_KIDS, 0, 0);
                my $dp = B::Deparse->new;
                $dp->init;
                $dp->{curcv} = $cv;
                my $save_sibling = $op->sibling;
                my $save_next = $op->next;
                $leave->first($op);
                $leave->last($op);
                $op->sibling(0);
                $op->next($leave);
                print "start deparse:\n";
	            print $dp->deparse($op, 0);
                print "\nend deparse\n\n";
                $op->sibling($save_sibling);
                $op->next($save_next);
                goto again;
            };
        };


Notes:

It currently crawls deeply into the bytecode rather than just walking down the top level
even though the pointcut thingie only allows line level resolution on modification right
now.  The pointcut interface is just for demonstration only right now.  Something
more useful might take a list of constraints along the lines of:

* After a variable of a given name is declared
* After/before a variable of a given name is assigned to
* After/before a method call of a specific name
* After/before a variable of a given name is used as an argument in a method call to a method of a specific name
* After/before a specific operation, such as print, close, etc

Done:

* Change instructions to use the pad of the routine they got moved into:
  Lookup variable names in the anonsub, find variables of the same names in the target sub's pad,
  and change the targ to match.