Algorithm::SixDegrees - Find a path through linked elements in a set


Algorithm-SixDegrees documentation Contained in the Algorithm-SixDegrees distribution.

Index


Code Index:

NAME

Top

Algorithm::SixDegrees - Find a path through linked elements in a set

VERSION

Top

Version 0.03

SYNOPSIS

Top

	use Algorithm::SixDegrees;

	my $sd1 = Algorithm::SixDegrees->new();
	$sd1->data_source( actors => \&starred_in );
	$sd1->data_source( movies => \&stars_of );
	@elems = $sd1->make_link('actors', 'Tom Cruise', 'Kevin Bacon');

	my $sd2 = Algorithm::SixDegrees->new();
	$sd2->forward_data_source( friends => \&friends, @args );
	$sd2->reverse_data_source( friends => \&friend_of, @args );
	@elems = $sd2->make_link('friends', 'Bob', 'Mark');

DESCRIPTION

Top

Algorithm::SixDegrees is a Perl implementation of a breadth-first search through a set of linked elements in order to find the shortest possible chain linking two specific elements together.

In simpler terms, this module will take a bunch of related items and attempt to find a relationship between two of them. It looks for the shortest (and generally, simplest) relationship it can find.

CONSTRUCTOR

Top

new()

Algorithm::SixDegrees requires use as an object; it can't (yet) be used as a stand-alone module. new takes no arguments, however.

FUNCTIONS

Top

forward_data_source( name => \&sub, @args );

Tells Algorithm::SixDegrees that all items in the data set relating to name can be retrieved by calling sub. See SUBROUTINE RULES.

In our friends example above, if Bob considers Mark a friend, but Mark doesn't consider Bob a friend, calling the sub with "Bob" as an argument should return "Mark", but calling the sub with "Mark" as an argument should not return "Bob".

reverse_data_source( name => \&sub, @args );

Tells Algorithm::SixDegrees that all items in the data set related to by name can be retrieved by calling sub. See SUBROUTINE RULES.

In the same friends example, calling the sub with "Bob" as an argument should not return "Mark", but calling the sub with "Mark" as an argument should return "Bob".

data_source( name => \&sub, @args );

Sets up a data source as both forward and reverse. This is useful if the data source is mutually relational; that is, in our actors/movies example, Kevin Bacon is always in Mystic River, and Mystic River always has Kevin Bacon in it.

error

Returns the current value of $Algorithm::SixDegrees::ERROR. See SUBROUTINE RULES.

SUBROUTINE RULES

Top

Passed-in subroutines should take at least one argument, which should be some form of unique identifier, and return a list of unique identifiers that have a relation to the argument.

The unique identifiers must be able to be compared with eq.

The identifiers should be unique in datatype; that is, in an actor/movie relationship, "Kevin Bacon" can be both the name of an actor and a movie.

A linked data type must return identifiers that relate across the link; that is, for an actor/movie relationship, an actor subroutine should return movies, and a movie subroutine should return actors.

Additional arguments can be provided; these will be stored in the object and passed through as the second and further arguments to the subroutine. This may be useful, for example, if you're using some form of results caching and need to pass a tied handle around.

If you return explicit undef, please set $Algorithm::SixDegrees::ERROR with an error code. Explicit undef means that an error occurred that should terminate the search; it should be returned as a one-element list.

AUTHOR

Top

Pete Krawczyk, <petek@cpan.org>

BUGS

Top

Please report any bugs or feature requests to bug-algorithm-sixdegrees@rt.cpan.org, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

ACKNOWLEDGEMENTS

Top

Andy Lester and Ricardo Signes wrote Module::Starter, which helped get the framework up and running fairly quickly.

Brad Fitzpatrick of http://livejournal.com for giving me access to a LiveJournal interface to determine linking information on that site, which enabled me to write the algorithm that has been reduced into this module.

COPYRIGHT & LICENSE

Top


Algorithm-SixDegrees documentation Contained in the Algorithm-SixDegrees distribution.
package Algorithm::SixDegrees;

require 5.006;
use warnings;
use strict;
use UNIVERSAL qw(isa);

our $VERSION = '0.03';
our $ERROR = '';

sub new {
	my $class = shift;
	my $self = {
		_source_left  => {},
		_source_right => {},
		_sources      => [],
		_investigated => {},
	};
	return bless $self,$class;
}

sub forward_data_source {
	my ($self, $name, $sub, @args) = @_;
	die "Data sources must be named\n" unless defined($name);
	die "Data sources must have code supplied\n" unless defined($sub);
	die "Data sources must have a coderef argument\n" unless ref($sub) && isa($sub,'CODE');
	$self->{'_source_left'}{$name}{'sub'} = $sub;
	$self->{'_source_left'}{$name}{'args'} = \@args;
	foreach my $source (@{$self->{'_sources'}}) {
		return if $source eq $name;
	}
	push(@{$self->{'_sources'}},$name);
	return;
}

sub reverse_data_source {
	my ($self, $name, $sub, @args) = @_;
	die "Data sources must be named\n" unless defined($name);
	die "Data sources must have code supplied\n" unless defined($sub);
	die "Data sources must have a coderef argument\n" unless ref($sub) && isa($sub,'CODE');
	$self->{'_source_right'}{$name}{'sub'} = $sub;
	$self->{'_source_right'}{$name}{'args'} = \@args;
	foreach my $source (@{$self->{'_sources'}}) {
		return if $source eq $name;
	}
	push(@{$self->{'_sources'}},$name);
	return;
}

sub data_source {
	my ($self, $name, $sub, @args) = @_;
	$self->forward_data_source($name,$sub,@args);
	$self->reverse_data_source($name,$sub,@args);
	return;
}

sub make_link {
	my ($self, $mainsource, $start, $end) = @_;
	$ERROR = undef;

	unless (ref($self) && isa($self,__PACKAGE__)) {
		$ERROR = 'Invalid object reference used to call make_link';
		return;
	}
	unless (defined($mainsource)) {
		$ERROR = 'Data set name is not defined';
		return;
	}
	unless (defined($start)) {
		$ERROR = 'Starting identifier is not defined';
		return;
	}
	unless (defined($end)) {
		$ERROR = 'Ending identifier is not defined';
		return;
	}

	# Assume working from "left to right"; therefore, links leading
	# from the starting identifier are on the "left side", and links
	# leading to the ending identifier are on the "right side".
	my %leftside = (); 
	my %rightside = ();

	# If $altsource gets defined, that means there are two sources used.
	my $altsource;

	unless (exists($self->{'_sources'}) && isa($self->{'_sources'},'ARRAY')) {
		$ERROR = 'No data sources defined';
		return;
	}
	my @sources = @{$self->{'_sources'}};
	my $source_exists = 0;
	foreach my $source (@sources) {
		if ($mainsource eq $source) {
			$source_exists = 1;
			$leftside{$source} = {$start, undef};
			$rightside{$source} = {$end, undef};
		} else {
			$altsource = $source;
			$leftside{$source} = {};
			$rightside{$source} = {};
		}
		unless (ref($self->{'_source_left'}) && 
			ref($self->{'_source_left'}{$source}) &&
			isa($self->{'_source_left'}{$source}{'sub'},'CODE')) {
			$ERROR = "Source '$source' does not have a valid forward subroutine";
			return;
		}
		unless (ref($self->{'_source_right'}) && 
			ref($self->{'_source_right'}{$source}) &&
			isa($self->{'_source_right'}{$source}{'sub'},'CODE')) {
			$ERROR = "Source '$source' does not have a valid reverse subroutine";
			return;
		}
		$self->{'_investigated'}{$source} = {};
	}
	unless ($source_exists) {
		$ERROR = "Source '$mainsource' was not defined";
		return;
	}
	if (scalar(keys(%leftside)) > 2) {
		$ERROR = 'Too many defined data sources; maximum is 2';
		return;
	}


	if ($start eq $end) {
		# Only one element if the start and end are the same.
		return wantarray ? ($start) : [$start];
	}

	my $leftcount = 1;
	my $rightcount = 1;

	# If altsource exists, pull the left side main, then pull the right side main,
	# and check for middle matches.  This reduces database hits as opposed to
	# where it's pulled left main - left alt; left alt >= 1 at that point, whereas
	# right main on the first loop == 1.  Following that, pull the left alt and 
	# then the right alt, which gets the CHAINLOOP back in synch.

	if (defined($altsource)) {
		my ($count,$id,$err) = $self->_match('left',$mainsource,$altsource,\%leftside,\%rightside);
		if (defined($err)) { $ERROR = $err; return; };
		if (defined($id)) { $ERROR = 'Internal error, id cannot match here'; return; };
		return if !defined($count) || $count == 0;

		($count,$id,$err) = $self->_match('right',$mainsource,$altsource,\%rightside,\%leftside);
		if (defined($err)) { $ERROR = $err; return; };
		if (defined($id)) { 
			my @abc = ($leftside{$altsource}{$id},$id,$rightside{$altsource}{$id});
			return wantarray ? @abc : \@abc;
		};
		return if !defined($count) || $count == 0;

		($leftcount,$id,$err) = $self->_match('left',$altsource,$mainsource,\%leftside,\%rightside);
		if (defined($err)) { $ERROR = $err; return; };
		if (defined($id)) { $ERROR = 'Internal error, id cannot match here'; return; };
		return if !defined($leftcount) || $leftcount == 0;

		($rightcount,$id,$err) = $self->_match('right',$altsource,$mainsource,\%rightside,\%leftside);
		if (defined($err)) { $ERROR = $err; return; };
		if (defined($id)) { 
			my $la = $leftside{$mainsource}{$id};
			my $lm = $leftside{$altsource}{$la};
			my $ra = $rightside{$mainsource}{$id};
			my $rm = $rightside{$altsource}{$ra};
			unless (defined($la) && defined($lm) && defined($ra) && defined($rm)) {
				$ERROR = 'Internal error, identifier not defined';
				return;
			}
			return wantarray ? ($lm,$la,$id,$ra,$rm) : [$lm,$la,$id,$ra,$rm];
		};
		return if !defined($rightcount) || $rightcount == 0;

	}

	# There is bias here, but the tie needs to be broken, so in the
	# event of a tie, move left to right in the chain.

	CHAINLOOP: {
		my $id;
		my $err;
		if ($leftcount <= $rightcount) {
			if (defined($altsource)) {
				($leftcount,$id,$err) = $self->_match_two('left',$mainsource,$altsource,\%leftside,\%rightside);
			} else {
				($leftcount,$id,$err) = $self->_match_one('left',$mainsource,\%leftside,\%rightside);
			}
		} else {
			if (defined($altsource)) {
				($rightcount,$id,$err) = $self->_match_two('right',$mainsource,$altsource,\%rightside,\%leftside);
			} else {
				($rightcount,$id,$err) = $self->_match_one('right',$mainsource,\%rightside,\%leftside);
			}
		}
		if(defined($err)) {
			$ERROR = $err;
			return;
		}
		if(defined($id)) {
			# If _match returns an id, that means a match was found.
			# To get it, we simply have to trace out from the "middle"
			# to get the full link.
			my @match = ($id);
			# middle, building to left.
			while($match[0] ne $start) {
				unshift(@match,$leftside{$mainsource}{$match[0]});
				unshift(@match,$leftside{$altsource}{$match[0]}) if defined($altsource);
				if (!defined($match[0])) {
					$ERROR = 'Internal error, left identifier was not defined';
					return;
				}
			}
			# middle building to right
			while($match[-1] ne $end) {
				push(@match,$rightside{$mainsource}{$match[-1]});
				push(@match,$rightside{$altsource}{$match[-1]}) if defined($altsource);
				if (!defined($match[-1])) {
					$ERROR = 'Internal error, right identifier was not defined';
					return;
				}
			}
			return wantarray ? @match : \@match;
		} 
		if ($leftcount == 0 || $rightcount == 0) {
			last CHAINLOOP;
		}
		redo CHAINLOOP;
	}

	return wantarray ? () : [];
}

sub error {
	return $ERROR;
}

sub _match_two {
	my ($self,$side,$mainsource,$altsource,$thisside,$thatside) = @_;
	# Assume $self is OK since this is an internal function
	my ($count,$id,$err) = $self->_match($side,$mainsource,$altsource,$thisside,$thatside);
	return (undef,undef,$err) if defined($err);
	return ($count,$id,$err) if defined($id);
	return (0,undef,undef) if !defined($count) || $count == 0;
	# mental note: this should never return an id
	# after all, you can't have two mains together in a true
	# alternating chain
	return $self->_match($side,$altsource,$mainsource,$thisside,$thatside);
}

sub _match_one {
	my ($self,$side,$source,$thisside,$thatside) = @_;
	# Assume $self is OK since this is an internal function
	return $self->_match($side,$source,$source,$thisside,$thatside);
}

sub _match {
	my ($self,$side,$fromsource,$tosource,$thisside,$thatside) = @_;
	# Assume $self is OK since this is an internal function
	return (undef,undef,'Internal error: missing code') unless isa($self->{"_source_$side"}{$fromsource}{'sub'},'CODE');
	return (undef,undef,'Internal error: missing side (1)') unless isa($thisside,'HASH');
	return (undef,undef,'Internal error: missing side (2)') unless exists($thisside->{$fromsource});
	return (undef,undef,'Internal error: missing side (3)') unless isa($thatside,'HASH');
	return (undef,undef,'Internal error: missing side (4)') unless exists($thatside->{$tosource});

	my $newsidecount = 0;
	foreach my $id (keys %{$thisside->{$fromsource}}) {
		next if exists($self->{"_investigated"}{$fromsource}{$id});
		$self->{"_investigated"}{$fromsource}{$id} = 1;

		my $use_args = isa($self->{"_source_$side"}{$fromsource}{'args'},'ARRAY') ? 1 : 0;

		my @ids = &{$self->{"_source_$side"}{$fromsource}{'sub'}}($id,($use_args?@{$self->{"_source_$side"}{$fromsource}{'args'}}:()));
		return (undef,undef,$ERROR) if scalar(@ids) == 1 && !defined($ids[0]);
		foreach my $thisid (@ids) {
			unless (exists($thisside->{$tosource}{$thisid})) {
				$thisside->{$tosource}{$thisid} = $id;
				$newsidecount++;
			}
			return (0,$thisid,undef) if exists($thatside->{$tosource}{$thisid});
		}
	}

	return $newsidecount;
}

1; # End of Algorithm::SixDegrees