Net::CascadeCopy - Rapidly propagate (rsync/scp/...) files to many servers in multiple locations.


Net-CascadeCopy documentation  | view source Contained in the Net-CascadeCopy distribution.

Index


NAME

Top

Net::CascadeCopy - Rapidly propagate (rsync/scp/...) files to many servers in multiple locations.

VERSION

Top

version 0.2.6

SYNOPSIS

Top

    use Net::CascadeCopy;

    # create a new CascadeCopy object
    my $ccp = Net::CascadeCopy->new( { ssh          => "/path/to/ssh",
                                       ssh_flags    => "-x -A",
                                       max_failures => 3,
                                       max_forks    => 2,
                                       output       => "log",
                                   } );

    # set the command and arguments to use to transfer file(s)
    $ccp->set_command( "rsync", "-rav --checksum --delete -e ssh" );

    # another example with scp instead
    $ccp->set_command( "/path/to/scp", "-p" );




    # set path on the local server
    $ccp->set_source_path( "/path/on/local/server" );
    # set path on all remote servers
    $ccp->set_target_path( "/path/on/remote/servers" );

    # add lists of servers in multiple datacenters
    $ccp->add_group( "datacenter1", \@dc1_servers );
    $ccp->add_group( "datacenter2", \@dc2_servers );

    # transfer all files
    $ccp->transfer();




DESCRIPTION

Top

This module implements a scalable method of quickly propagating files to a large number of servers in one or more locations via rsync or scp.

A frequent solution to distributing a file or directory to a large number of servers is to copy it from a central file server to all other servers. To speed this up, multiple file servers may be used, or files may be copied in parallel until the inevitable bottleneck in network/disk/cpu is reached. These approaches run in O(n) time.

This module and the included script, ccp, take a much more efficient approach that is O(log n). Once the file(s) are been copied to a remote server, that server will be promoted to be used as source server for copying to remaining servers. Thus, the rate of transfer increases exponentially rather than linearly.

Servers can be specified in groups (e.g. datacenter) to prevent copying across groups. This maximizes the number of transfers done over a local high-speed connection (LAN) while minimizing the number of transfers over the WAN.

The number of multiple simultaneous transfers per source point is configurable. The total number of simultaneously forked processes is limited via Proc::Queue, and is currently hard coded to 32.

CONSTRUCTOR

Top

new( { option => value } )

Returns a reference to a new use Net::CascadeCopy object.

Supported options:

ssh => "/path/to/ssh"

Name or path of ssh script ot use to log in to each remote server to begin a transfer to another remote server. Default is simply "ssh" to be invoked from $PATH.

ssh_flags => "-x -A"

Command line options to be passed to ssh script. Default is to disable X11 and enable agent forwarding.

max_failures => 3

The Maximum number of transfer failures to allow before giving up on a target host. Default is 3.

max_forks => 2

The maximum number of simultaneous transfers that should be running per source server. Default is 2.

output => undef

Specify options for child process output. The default is to discard stdout and display stderr. "log" can be specified to redirect stdout and stderr of each transfer to to ccp.sourcehost.targethost.log. "stdout" option also exists which will not supress stdout, but this option is only intended for debugging.

INTERFACE

Top

$self->add_group( $groupname, \@servers )

Add a group of servers. Ideally all servers will be located in the same datacenter. This may be called multiple times with different group names to create multiple groups.

$self->get_groups()

Get list of groups. List is sorted by the order in which the groups were added.

$self->set_command( $command, $args )

Set the command and arguments that will be used to transfer files. For example, "rsync" and "-ravuz" could be used for rsync, or "scp" and "-p" could be used for scp.

$self->set_source_path( $path )

Specify the path on the local server where the source files reside.

$self->set_target_path( $path )

Specify the target path on the remote servers where the files should be copied.

$self->transfer( )

Transfer all files. Will not return until all files are transferred.

$self->get_transfer_map( )

Returns a data structure describing the transfers that were peformed, i.e. which hosts were used as the sources for which other hosts.

BUGS AND LIMITATIONS

Top

Note that this is still a beta release.

There is one known bug. If an initial copy from the localhost to the first server in one of the groups fails, it will not be retried. the real solution to this bug is to refactor the logic for the inital copy from localhost. The current logic is a hack. Max forks should be configured for localhost transfers, and localhost could be listed in a group to allow it to be re-used by that group once all the intial transfers to the first server in each group were completed.

If using rsync for the copy mechanism, it is recommended that you use the "--delete" and "--checksum" options. Otherwise, if the content of the directory structure varies slightly from system to system, then you may potentially sync different files from some servers than from others.

Since the copies will be performed between machines, you must be able to log into each source server to each target server (in the same group). Since empty passwords on ssh keys are insecure, the default ssh arguments enable the ssh agent for authentication (the -A option). Note that each server will need an entry in .ssh/known_hosts for each other server.

Multiple syncs will be initialized within a few seconds on remote hosts. Ideally this could be configurable to wait a certain amount of time before starting additional syncs. This would give rsync some time to finish computing checksums, a potential disk/cpu bottleneck, and move into the network bottleneck phase before starting the next transfer.

There is no timeout enforced in CascadeCopy yet. A copy command that hangs forever will prevent CascadeCopy from ever completing.

Please report problems to VVu@geekfarm.org. Patches are welcome.

SUPPORT AND DOCUMENTATION

Top

    RT, CPAN's request tracker
        http://rt.cpan.org/NoAuth/Bugs.html?Dist=Net-CascadeCopy

    AnnoCPAN, Annotated CPAN documentation
        http://annocpan.org/dist/Net-CascadeCopy

    Search CPAN
        http://search.cpan.org/dist/Net-CascadeCopy




SEE ALSO

Top

ccp - command line script distributed with this module

http://www.geekfarm.org/wu/muse/CascadeCopy.html

CONTRIBUTORS

Top

0.2.3 incorporates a fix from twelch for an endless loop that occurred when the initial transfer failed.

AUTHOR

Top

Alex White <vvu@geekfarm.org>

LICENCE AND COPYRIGHT

Top


Net-CascadeCopy documentation  | view source Contained in the Net-CascadeCopy distribution.