| Git-FastExport documentation | Contained in the Git-FastExport distribution. |
Git::FastExport::Stitch - Stitch together multiple git fast-export streams
# create a new stitch object
my $export = Git::FastExport::Stitch->new();
# stitch in several git fast-export streams
# a git directory
$export->stitch( A => 'A' );
# a Git repository object
$export->stitch( Git->repository( Directory => 'B' ) => 'B' );
# a Git::FastExport object
$export->stitch( Git::FastExport->new('C') => 'C' );
# output the stitched stream
while ( my $block = $export->next_block() ) {
print $block->as_string();
}
Git::FastExport::Stich is a module that "stitches" together several
git fast-export streams. This module is the core of the git-stitch-repo
utility.
Git::FastExport::Stitch objects can be used as Git::FastExport,
since they support the same inteface for the next_block() method.
Git::FastExport::Stitch supports the following methods:
Create a new Git::FastExport::Stitch object.
The options hash defines options that will be used during the creation of the stitched repository.
The select option defines the selection algorithm to be used when the last alien child
algorithm reaches a branch point. Valid values are: first, last and random. The
default value is last.
See STITCHING ALGORITHM for details about what these options really mean.
The remaining parameters (if any) are taken to be parameters (passed by
pairs) to the stitch() method.
Add the given $repo to the list of repositories to stitch in.
$repo can be either a directory, or a Git object (both will
be used to instantiate a Git::FastExport object) or directly a
Git::FastExport object.
The optional $dir parameter will be used as the relative directory
under which the trees of the source repository will be stored in the
stitched repository.
Return the next block of the stitched repository, as a
Git::FastExport::Block object.
Return nothing at the end of stream.
Git::FastExport::Stitch processes the input commits in --date-order
fashion, and builds the new graph by attaching the new commit to another
commit of the graph being constructed. It starts from the "original"
parents of the node, and tries do follow the graph as far as possible.
When a commit has several suitable child commits, it needs to make a selection. There are currently three selection algorithms:
Pick the last child commit, i.e. the most recent one. This is the default.
Pick the first child commit, i.e. the oldest one.
Pick a random child.
Imagine we have two repositories A and B that we want to stitch into a repository C so that all the files from A are in subdirectory A and all the files from B are in subdirectory B.
Note: in the following ASCII art graphs, horizontal order is chronological.
Repository A:
,topic ,master
,-A3------A5--A6
/ /
A1--A2------A4'
Branch master points to A5 and branch topic points to A3.
Repository B:
,topic ,master
,-B3------B5------B7--B8
/ /
B1--B2------B4------B6'
Branch master points to B8 and branch topic points to B5.
The RESULT repository should preserve chronology, commit relationships and branches as much as possible, while giving the impression that the directories A/ & B/ did live side-by-side all the time.
Assuming additional timestamps not shown on the above graphs
(the commit order is A1, B1, A2, B2, A3, A4, B3, B4, A5, B5, B6, B7, B8, A6),
Git::FastExport::Stitch will produce a git-fast-import stream that will
create the following history, depending on the value of --select:
,topic-B
,-B3----------B5----.
/ \ ,master-B
A1--B1--A2--B2------A4------B4--A5------B6--B7---B8--A6
\ / `master-A
`-A3------------'
`topic-A
,---------B4----------B6-.
/ \ ,master-B
A1--B1--A2--B2--A3------B3------A5--B5------B7---B8--A6
\ `topic-A / `topic-B `master-A
`-----A4--------'
In this example, there are only two places where the selection process is triggered, and there are only two items to choose from each time. Therefore the random selection algorithm will produce 4 possible different results.
In addition to the results shown above (last+last and first+first),
we can also obtain the two following graphs:
first+last:
,topic-A ,master-B
A1--B1--A2--B2--A3--------------A5------B6--B7---B8--A6
\ / / `master-A
`-----A4------B4' B5----'
\ / `topic-B
`-B3--------'
last+first:
,master-B
A1--B1--A2--B2------A4------B4----------B6--B7---B8--A6
\ \ / `master-A
\ `-B3------A5--B5----'
\ / `topic-B
A3------------'
`topic-A
Any mathematician will tell you there are many many ways to stitch two DAG together. This programs tries very hard not to create inconsistent history with regard to each input repository.
The algorithm used by Git::FastExport::Stitch enforces the following
rules when building the resulting repository:
The current implementation can probably be improved, and more options added. I'm very interested in test repositories that do not give the expected results.
To run the stitching algorithm, Git::FastExport::Stitch makes
use of several internal methods. These are not part of the public
interface of the module, and are detailed below for those interested in
the algorithm itself.
Given a repo key in the internal structure listing all the repositories to stitch together, this method "translates" the current block using the references (marks) of the resulting repository.
To ease debugging, the translated mark count starts at 1_000_000.
Add the given parents to the node, and update the internal structure containing the node lineage.
Given a node, its "branch" name (actually, the reference given on the
commit line of the fast-export) and a structure describing it's
lineage over the various source repositories, find a suitable commit to
which attach it.
This method is the heart of the stitching algorithm.
git-stitch-repo
Philippe Bruhat (BooK)
Copyright 2008-2009 Philippe Bruhat (BooK), All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
| Git-FastExport documentation | Contained in the Git-FastExport distribution. |
package Git::FastExport::Stitch; use strict; use warnings; use Carp; use Scalar::Util qw( blessed ); use List::Util qw( first ); use Git::FastExport; our $VERSION = '0.07'; 'progress 1 objects'; sub new { my ( $class, $options, @args ) = @_; # create the object my $self = bless { # internal structures mark => 1_000_000, # mark counter in the new repo mark_map => {}, commits => {}, repo => {}, name => 'A', cache => {}, # default options select => 'last', }, $class; # set the options for my $key (qw( select )) { $self->{$key} = $options->{$key} if exists $options->{$key}; } croak "Invalid value for 'select' option: '$self->{select}'" if $self->{select} !~ /^(?:first|last|random)$/; # process the remaining args $self->stitch( splice @args, 0, 2 ) while @args; return $self; } # add a new repo to stich in sub stitch { my ( $self, $repo, $dir ) = @_; my $export = blessed($repo) && $repo->isa('Git::FastExport') ? $repo : eval { Git::FastExport->new($repo) }; $@ =~ s/ at .*\z//s, croak $@ if !$export; # initiate the Git::FastExport stream $export->fast_export(qw( --progress=1 --all --date-order )) if !$export->{export_fh}; # do not stich a repo with itself $repo = $export->{source}; croak "Already stitching repository $repo" if exists $self->{repo}{$repo}; # set up the internal structures $export->{mapdir} = $dir; $self->{repo}{$repo}{repo} = $repo; $self->{repo}{$repo}{dir} = $dir; $self->{repo}{$repo}{parser} = $export; $self->{repo}{$repo}{name} = $self->{name}++; $self->{repo}{$repo}{block} = $export->next_block(); $self->_translate_block( $repo ); return $self; } # return the next block in the stitched stream sub next_block { my ($self) = @_; my $repo = $self->{repo}; # keep a list of next blocks (per repo) # any undef block means the stream is finished delete $repo->{$_} for grep { !defined $repo->{$_}{block} } keys %$repo; # no repo left, we're done return if ! keys %$repo; # return any non-commit block directly if ( my $next = first { $repo->{$_}{block}{type} ne 'commit' } keys %$repo ) { my $block = $repo->{$next}{block}; $repo->{$next}{block} = $repo->{$next}{parser}->next_block(); $self->_translate_block( $next ); return $block; } # select the oldest alvailable commit my ($next) = keys %$repo; $next = $repo->{$next}{block}{date} < $repo->{$_}{block}{date} ? $next : $_ for keys %$repo; my $commit = $repo->{$next}{block}; # fetch the next block $repo->{$next}{block} = $repo->{$next}{parser}->next_block(); $self->_translate_block( $next ); # prepare the attachement algorithm $repo = $repo->{$next}; my $commits = $self->{commits}; # first commit in the old repo linked to latest commit in new repo if ( $self->{last} && !$commit->{from} ) { $commit->{from} = ["from :$self->{last}"]; } # update historical information my ($id) = $commit->{mark}[0] =~ /:(\d+)/g; $self->{last} = $id; # last commit applied my $branch = ( split / /, $commit->{header} )[1]; my $node = $commits->{$id} = { name => $id, repo => $repo->{repo}, branch => $branch, children => [], parents => {}, merge => exists $commit->{merge}, }; # mark our original source $commit->{header} =~ s/$/-$repo->{name}/; # this commit's parents my @parents = map {/:(\d+)/g} @{ $commit->{from} || [] }, @{ $commit->{merge} || [] }; # get the reference parent list used by _last_alien_child() my $parents = {}; for my $parent (@parents) { for my $repo ( keys %{ $commits->{$parent}{parents} } ) { $parents->{$repo}{$_} = 1 for keys %{ $commits->{$parent}{parents}{$repo} }; } } # map each parent to its last "alien" commit my %parent_map = map { $_ => $self->_last_alien_child( $commits->{$_}, $branch, $parents )->{name} } @parents; # map parent marks for ( @{ $commit->{from} || [] }, @{ $commit->{merge} || [] } ) { s/:(\d+)/:$parent_map{$1}/g; } # update the parents information $self->_add_parents( $node => map { $commits->{ $parent_map{$_} } } @parents ); # dump the commit return $commit; } sub _translate_block { my ( $self, $repo ) = @_; my $mark_map = $self->{mark_map}; my $block = $self->{repo}{$repo}{block}; # nothing to do return if !defined $block; # map to the new mark for ( @{ $block->{mark} || [] } ) { s/:(\d+)/:$self->{mark}/; $mark_map->{$repo}{$1} = $self->{mark}++; } # update marks in from & merge for ( @{ $block->{from} || [] }, @{ $block->{merge} || [] } ) { s/:(\d+)/:$mark_map->{$repo}{$1}/g; } # update marks & dir in files for ( @{ $block->{files} } ) { s/^M (\d+) :(\d+)/M $1 :$mark_map->{$repo}{$2}/; if ( my $dir = $self->{repo}{$repo}{dir} ) { s!^(M \d+ :\d+) (.*)!$1 $dir/$2!; # filemodify s!^D (.*)!D $dir/$1!; # filedelete # /!\ quotes may happen - die and fix if needed die "Choked on quoted paths in $repo! Culprit:\n$_\n" if /^[CR] \S+ \S+ /; # filecopy | filerename s!^([CR]) (\S+) (\S+)!$1 $dir/$2 $dir/$3!; } } } # given a commit (item from $self->{commits}) # add the parents from the given commits to it sub _add_parents { my ( $self, $node, @parents ) = @_; for my $parent (@parents) { push @{ $parent->{children} }, $node->{name}; for my $repo_name ( keys %{ $parent->{parents} } ) { $node->{parents}{$repo_name}{$_} = 1 for keys %{ $parent->{parents}{$repo_name} || {} }; } $node->{parents}{ $parent->{repo} }{ $parent->{name} } = 1; } return $node; } # find the last child of this node # that has either no child # or a child in our repo # or an alien child that has the same parent list sub _last_alien_child { my ( $self, $node, $branch, $parents ) = @_; my $commits = $self->{commits}; my $from = $node->{name}; my $repo = $node->{repo}; my $old = ''; while ( $node ne $old ) { $old = $node; # no children nodes return $node if ( !@{ $node->{children} } ); # some children nodes are local return $node if grep { $commits->{$_}{repo} eq $repo } @{ $node->{children} }; # all children are alien to us my @valid; for my $id ( @{ $node->{children} } ) { my $peer = $commits->{$id}; # parents of $peer in $peer's repo contains # all parents from $parent in $peer's repo next if grep { !exists $peer->{parents}{ $peer->{repo} }{$_} } keys %{ $parents->{ $peer->{repo} } }; # this child node has a valid parent list push @valid, $id; } # compute the commit to attach to, using the requested algorithm if (@valid) { my $node_id = $self->{cache}{"$from $node->{name}"} ||= $self->{select} eq 'last' ? $valid[-1] : $self->{select} eq 'first' ? $valid[0] : $valid[ rand @valid ]; $node = $commits->{$node_id}; } } # return last valid child return $node; } __END__