Sort::Bucket - a fast XS bucket sort


Sort-Bucket documentation Contained in the Sort-Bucket distribution.

Index


Code Index:

NAME

Top

Sort::Bucket - a fast XS bucket sort

SYNOPSIS

Top

  use Sort::Bucket qw(inplace_bucket_sort);

  inplace_bucket_sort(@some_array);

DESCRIPTION

Top

A fast XS implementation of the Bucket Sort algorithm. If your data is well suited to a bucket sort then this can be significantly faster than Perl's builtin sort() function.

A bucket sort works best when there is a large amount of variation in the first few bytes of the strings to be sorted.

LIMITATIONS

Top

Only in-place sorting is implemented so far.

Limited to sorting in standard string comparison order.

Sort::Bucket can be used to sort either byte strings or character strings, but not a mixture. If you apply Sort::Bucket to an array containing both byte strings and character strings then it may sort them into the wrong order, so don't do that.

There is no locale support, i.e. use locale will not effect the order into which this module sorts your strings as it does with Perl's builtin sort.

If the array is already partially sorted, Perl's builtin sort() can take advantage of that to sort it very quickly. For this reason, Sort::Bucket can be slower than Perl's builtin sort on partially sorted input, even with data that's well suited to a bucket sort.

EXPORTS

Top

Exports nothing by default. The following functions are available for import:

inplace_bucket_sort( ARRAY_OF_STRINGS [,BUCKET_BITS] )

Sorts ARRAY_OF_STRINGS in-place.

BUCKET_BITS specifies the number of bits from the start of each string to use to select the bucket into which to place the string. If set, it must be an integer from 1 to 31.

The elements of the array will be distributed into 2**BUCKET_BITS buckets, and then sorted within each bucket. If BUCKET_BITS is omitted then a sensible value will be selected based on the size of the array.

ALGORITHM

Top

For each element in the array, a major bucket and a minor bucket value is computed. The major bucket is the first BUCKET_BITS bits of the string, and the minor bucket is the next 32 bits.

The elements are then distributed into 2**BUCKET_BITS major buckets, according to their major bucket values. The minor bucket value is stored along with each element.

For each of the 2**BUCKET_BITS possible major bucket values, the elements in that bucket are sorted by minor bucket. Any that compare equal by minor bucket are further sorted using Perl's native sort.

The extra step of sorting by minor bucket before falling back to Perl's sort speeds up the process, since comparing 32-bit integers is much faster than comparing Perl scalars.

SEE ALSO

Top

sort in perlfunc

There are several other fast sorting modules on CPAN, any of which may be faster than Sort::Bucket for your data. For example, Sort::Key. Sort::Maker, Sort::Merge.

If the data to be sorted does not fit into memory, you probably want Sort::External.

AUTHOR

Top

Nick Cleaton, <nick@cleaton.net>

COPYRIGHT AND LICENSE

Top


Sort-Bucket documentation Contained in the Sort-Bucket distribution.

package Sort::Bucket;

use 5.008001;
use strict;
use warnings;

require Exporter;

our @ISA = qw(Exporter);
our @EXPORT_OK = qw(inplace_bucket_sort);

our $VERSION = '0.02';

require XSLoader;
XSLoader::load('Sort::Bucket', $VERSION);

1;
__END__