Algorithm::Points::MinimumDistance - Works out the distance from each point to its nearest neighbour. Kinda.


Algorithm-Points-MinimumDistance documentation Contained in the Algorithm-Points-MinimumDistance distribution.

Index


Code Index:

NAME

Top

Algorithm::Points::MinimumDistance - Works out the distance from each point to its nearest neighbour. Kinda.

DESCRIPTION

Top

Given a set of points in N-dimensional Euclidean space, works out for each point the distance to its nearest neighbour (unless its nearest neighbour isn't very close). The distance metric is a method; subclass and override it for non-Euclidean space.

SYNOPSIS

Top

  use Algorithm::Points::MinimumDistance;

  my @points = ( [1, 4], [3, 1], [5, 7] );
  my $dists = Algorithm::Points::MinimumDistance->new( points => \@points );

  foreach my $point (@points) {
      print "($point->[0], $point->[1]: Nearest neighbour distance is "
          . $dists->distance( point => $point ) . "\n";
  }

  print "Smallest distance between any two points is "
      . $dists->min_distance . "\n";

METHODS

Top

new
  my @points = ( [1, 4], [3, 1], [5, 7] );
  my $dists = Algorithm::Points::MinimumDistance->new( points  => \@points,
                                                       boxsize => 2 );

points should be an arrayref containing every point in your set. The representation of a point is as an N-element arrayref where N is the number of dimensions in which we are working. There is no check that all of your points have the right dimension. Whatever dimension the first point has is assumed to be the dimension of the space. So get it right.

boxsize defaults to 20.

box
  my @box = $nn->box( [1, 2] );

Returns the identifier of the box that the point lives in. A box is labelled by its "bottom left-hand" corner point.

distance
  my $nn = Algorithm::Points::MinimumDistance->new( ... );
  my $nn_dist = $nn->distance( point => [1, 4] );

Returns the distance between the specified point and its nearest neighbour. The point should be one of your original set. There is no check that this is the case. Note that if a point has no particularly close neighbours, then boxsize will be returned instead.

min_distance
  my $nn = Algorithm::Points::MinimumDistance->new( ... );
  my $nn_dist = $nn->min_distance;

Returns the minimum nearest-neighbour distance for all points in the set. Or boxsize if none of the points are close to each other.

ALGORITHM

Top

We use the hash as an approximate conservative metric to basically do clipping of space. A box is one cell of the space defined by the grid size. A region is a box and all the neighbouring boxes in all directions, i.e. all the boxes b such that d(b, c) <= 1 in the d-infinity metric Noting that d(b, c) is always an integer in this case.

  +-+-+-+-+-+
  | | | | | |
  +-+-+-+-+-+
  | |x|x|x| |
  +-+-+-+-+-+
  | |x|b|x| |
  +-+-+-+-+-+
  | |x|x|x| |
  +-+-+-+-+-+
  | | | | | |
  +-+-+-+-+-+ 

Now all points outside the region defined by the box b and the neighbours x can not be within maximum radius $C of any point in box b.

So we reverse the stunt and shove any point in box b into the hash lists for all boxes b and x so that when testing a point in any box, we have a list of all points in that box and any neighbouring boxes (the region).

AUTHOR

Top

Algorithm by Shevek, modularisation by Kake Pugh (kake@earth.li).

COPYRIGHT

Top

CREDITS

Top

Shevek is utterly fab for telling me how to solve this problem. Greg McCarroll is also fab for telling me what to call the module.


Algorithm-Points-MinimumDistance documentation Contained in the Algorithm-Points-MinimumDistance distribution.
package Algorithm::Points::MinimumDistance;
use strict;
use vars qw( $VERSION );
$VERSION = '0.01';

sub new {
    my ($class, %args) = @_;
    my @points = @{ $args{points} };
    my $dim = scalar @{ $points[0] };
    my $boxsize = $args{boxsize} || 20;

    # Precomputation for working out all boxes adjacent to a given box
    # (a point will be in all regions centred on its box or the
    # adjacent ones).
    # To find an adjacent box, vector-add one of these entries to it,
    # eg [1, 1] + [2, 0] - with a boxsize of 2, [3, 1] is adjacent to [1, 1].
    my @offsets = ( [ -$boxsize ], [ 0 ] , [ $boxsize ] );
    foreach (2..$dim) {
        @offsets = map { [ @$_, -$boxsize ], [ @$_, 0 ], [ @$_, $boxsize] }
                       @offsets;
    }

    my $self = { dimensions => $dim,
                 points     => \@points,
                 boxsize    => $boxsize,
                 offsets    => \@offsets,
                 regions    => { },
                 distances  => { }
	       };
    bless $self, $class;
    $self->_work_out_distances;

    return $self;
}

sub box {
    my ($self, $point) = @_;
    my @box = map { $_ - ($_ % $self->{boxsize}) } @$point;
    return @box;
}

sub _offsets {
    my $self = shift;
    return @{ $self->{offsets} };
}

# Accessor for the region centred on the box $args{centre}.  Returns a ref to
# an array of the points that are in that region.
sub region {
    my ($self, %args) = @_;
    my @centre = @{$args{centre}};
    my $key = join(",", @centre);
    my $regions = $self->{regions};
    $regions->{$key} ||= [];
    return $regions->{$key};
}

# Shevek says: "This is where the CPU time goes, but, gentle reader,
# please note that the complexity is LINEAR in the number of
# points. This is shit.  It's also trivial, so do it in XS."
# Kake says: "I don't speak XS yet."

sub _hash {
    my ($self, $point) = @_;

    # Compute the box in which $point lives.
    my @home_box = $self->box($point);

    # $point lives in the region centred on this box, plus all surrounding
    # regions.  Push it into each of these regions.  A region is
    # identified by the box at its centre.
    foreach my $delta ( $self->_offsets ) {
        my @centre = map { $delta->[$_] + $home_box[$_] } (0..$#home_box);
        my $region = $self->region( centre => \@centre );
        push @$region, $point;
    }
}

sub _work_out_distances {
    my $self = shift;
    my $points = $self->{points};

    # Work out which points live in which regions.
    $self->_hash($_) foreach (@$points);

    # For each point, check its distance from every other point inside
    # the region centred on its home box.  All points outside this region
    # are at least a distance 'boxsize' away.
    foreach my $point (@$points) {
        my @box = $self->box($point);
        my $min;
        my $region = $self->region( centre => \@box );
        foreach my $neighbour (@$region) {
            next if $neighbour == $point;    # Reference equality
            my $d = $self->d($point, $neighbour);
            $min = $d if (!defined $min or $d < $min);
        }
        $min ||= $self->{boxsize};
        $self->_store_distance( point => $point, distance => $min );
    }
}

sub _store_distance {
    my ($self, %args) = @_;
    my ($point, $distance) = @args{ qw( point distance ) };
    my $key = join(",", @$point);
    $self->{distances}{$key} = $distance;
}

# Override this for a different metric.
sub d {
    my ($self, $point1, $point2) = @_;
    my $t = 0;
    foreach (0..$#$point1) {
        $t += ($point1->[$_] - $point2->[$_]) ** 2;
    }
    return sqrt($t);
}

sub distance {
    my ($self, %args) = @_;
    my $point = $args{point};
    my $key = join(",", @$point);
    return $self->{distances}{$key};
}

sub min_distance {
    my $self = shift;
    my $dists = $self->{distances};
    my $min;
    foreach my $dist ( values %$dists ) {
        $min = $dist if (!defined $min or $dist < $min);
    }
    return $min;
}

1;