Parallel-MapReduce

# THIS IS ALL STILL EXPERIMENTAL!!
# DO NOT USE FOR PRODUCTION!!

What is it?

This is a pure-Perl implementation of MapReduce, the distributed algorithm introduced by Google.

See the RESOURCES file in this distribution and also

http://kill.devc.at/node/197

http://kill.devc.at/node/195

What are the external dependencies?

Apart from the normal CPAN dependencies, following software must be available for full operation:

apt-get install openssh-client
apt-get install memcached

How to get started?

  1. Write your algorithm in MR form and test it with

Parallel::MapReduce::Testing

That is a sequential, local implementation.

2) If you want to test it in a distributed setting, switch

to

Parallel::MapReduce::Sequential

and use the worker class

Parallel::MapReduce::Worker::SSH

     You (your process) need/s to have a working ssh account on
     every worker box you want to use. The public key of the SSH
     servers should be known and accepted locally and you should
     be able to log into the remote workers without giving a
     password. Otherwise the worker implementation will just block
     there waiting for your input.

3) Make sure your memcached daemons are running on each of

the servers:

/etc/init.d/memcached start

4) Cross your fingers and run your app.

How to install?

To install this module type the following:

perl Makefile.PL
make
make test
make install

>>> NOTE!! <<<

At this stage the tests are only testing local functionality, not any which involves the network setup. That cannot be detected or guessed. If you have a memcached on your local machine running, then

export MR_ONLINE=1
make test

will run a test against that. Do not be scared by the large amount of test output. That will eventually go away.

Copyright (C) 2008 by Robert Barta

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.0 or, at your option, any later version of Perl 5 you may have available.