Statistics::Benford - calculate the deviation from Benford's Law


Statistics-Benford documentation  | view source Contained in the Statistics-Benford distribution.

Index


NAME

Top

Statistics::Benford - calculate the deviation from Benford's Law

SYNOPSIS

Top

    my $stats = Statistics::Benford->new;
    my $diff = $stats->diff(%freq);
    my %diff = $stats->diff(%freq);
    my $signif = $stats->signif(%freq);
    my %signif = $stats->signif(%freq);

DESCRIPTION

Top

The Statistics::Benford module calculates the deviation from Benford's law, also known as the first-digit law. The law states that for many sources of real-life data, the leading digit follows a logarithmic, not uniform, distribution. This fact can be used to audit data for signs of fraud by comparing the expected frequency of the digits to the actual frequency in the data.

METHODS

Top

$stats = Statistics::Benford->new
$stats = Statistics::Benford->new($base, $pos, $len)

Creates a new Statistics::Benford object. The constructor will accept the number base, the position of the significant digit in the number to examine, and the number of digits starting from that position.

The default values are: (10, 0, 1).

%dist = $stats->dist($bool)
%dist = $stats->distribution($bool)

Returns a hash of the expected percentages.

$diff = $stats->diff(%freq)
$diff = $stats->difference(%freq)
%diff = $stats->diff(%freq);
%diff = $stats->difference(%freq)

Given a hash representing the frequency count of the digits in the data to examine, returns the percentage differences of each digit in list context, and the sum of the differences in scalar context.

$diff = $stats->signif(%freq)
$diff = $stats->z(%freq)
%diff = $stats->signif(%freq);
%diff = $stats->z(%freq)

Given a hash representing the frequency count of the digits in the data to examine, returns the z-statistic of each digit in list context, and the average of the z-statistics for all the digits in scalar context.

The z-statistic shows the statistical significance of the difference between the two proportions. Significance takes into account the size of the difference, the expected proportion, and the sample size. Scores above 1.96 are significant at the 0.05 level, and above 2.57 are significant at the 0.01 level.

EXAMPLE

Top

    # Generate a list of numbers approximating a Benford distribution.
    my $max = 10;  # numbers range from 0 to 10
    my @nums = map { ($max / rand($max)) - 1 } (1 .. 1_000);
    my %freq;
    for my $num (@nums) {
        my ($digit) = $num =~ /([1-9])/;  # find first non-zero digit
        $freq{$digit}++;
    }
    my $stats = Statistics::Benford->new(10, 0, 1);
    my $diff = $stats->diff(%freq);
    my $signif = $stats->signif(%freq);

SEE ALSO

Top

http://en.wikipedia.org/wiki/Benford's_law

http://www.mathpages.com/home/kmath302/kmath302.htm

NOTES

Top

When counting the first digit, make sure it is non-zero. For example the first non-zero digit of 0.038 is 3.

Convert non-decimal base digits to decimal representations. For example, to examine the first two digits of a hexadecimal number, like A1B2, take the first two digits 'A1', and convert them to decimal- 161.

The law becomes less accurate when the data set is small.

The law does not apply to data sets which have imposed limitations (e.g. max or min values) or where the numbers are assigned (e.g. ids and phone numbers).

The distribution becomes uniform at the 5th significant digit, i.e. all digits will have the same expected frequency.

It can help to partition the data into subsets for testing, e.g. testing negative and positive values separately.

REQUESTS AND BUGS

Top

Please report any bugs or feature requests to http://rt.cpan.org/Public/Bug/Report?Queue=Statistics-Benford. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

Top

You can find documentation for this module with the perldoc command.

    perldoc Statistics::Benford

You can also look for information at:

* GitHub Source Repository

http://github.com/gray/statistics-benford

* AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/Statistics-Benford

* CPAN Ratings

http://cpanratings.perl.org/d/Statistics-Benford

* RT: CPAN's request tracker

http://rt.cpan.org/Public/Dist/Display.html?Name=Statistics-Benford

* Search CPAN

http://search.cpan.org/dist/Statistics-Benford

COPYRIGHT AND LICENSE

Top

AUTHOR

Top

gray, <gray at cpan.org>


Statistics-Benford documentation  | view source Contained in the Statistics-Benford distribution.