Log::Statistics - near-real-time statistics from log files


Log-Statistics documentation  | view source Contained in the Log-Statistics distribution.

Index


NAME

Top

Log::Statistics - near-real-time statistics from log files

VERSION

Top

version 0.051

SYNOPSIS

Top

    use Log::Statistics;

    my $log = Log::Statistics->new();

    # field 3 in the log contains the duration.  registering a
    # duration field causes duration information to be added to all
    # summary data.
    $log->register_field( "duration", 2 );

    # field 1 in the log contains transaction name.  add this field to
    # the list of fields for which a summary report will be generated
    $log->add_field( "transaction", 0 );

    # field 2 in the log contains the log status entry (e.g. 404).
    # don't generate a report on this field, but add it to the list of
    # defined fields.
    $log->register_field( "status", 1 );

    # collect data about transaction and status grouped together.
    # this will result in a break-down of all transactions by status.
    # note this is different than all statuses by transaction.
    $log->add_group( [ "transaction", "status" ] );

    # add a regular expression to capture the year, month, day, hour,
    # and minute from the time field.
    my $time_regexp = ^(\d{4}-\d{2}-\d{2}\s\d{2}\:\d{2})
    $log->add_time_regexp( $time_regexp );

    # track overall response times per minute.  time is in field 6 in
    # the log
    $log->add_field( "time", 5 );

    # parse data in the log file
    $log->parse_text( $log_entries );

DESCRIPTION

Top

Log::Statistics is a module for collecting summary data from a log file. For examples of what can be done with Log::Statistics, see the code and documentation in scripts/logstatsd. logstatsd contains a prototype implementation of several features which will eventually be migrated from scripts/logstatsd.

The basic usage is to begin by creating a new Log::Statistics object. Next, register each field name that you want to collect data about, indicating which column that data is in. Next, add fields or groups of fields for which you wish to collect statistics. Finally, use parse_text to add multiple entries or parse_line to a single entry.

This module is alpha quality code, and is still under development. A number of the features currently implemented in logstatsd will eventually find their way back here.

SUBROUTINES/METHODS

Top

$log->new()

Create a new Log::Statistics object.

$log->register_field( $name, $column )

Define a field in the log, and indicate the column in which the field exists. Once a field has been registered, it can be used again later with add_group or add_field without having to re-specify the column number.

Registering a field does not automatically include the field in the report, except for the duration field. When a duration field has been defined, all data collected will contain information about durations.

$log->add_field( $column, $name, [ $threshold1, $threshold2, ... ] )

Collect summary data about the specified field. The column can be undef if the field has previously been registered using register_field().

For each field added to the report, summary data will be collected for each unique entry in the field. So for example, if a transaction field is added, then summary data will be collected about each unique transaction found in the log (e.g. the number of hits, total response times, etc).

Thresholds will only be honored if a duration field has been defined in the log (see THRESHOLDS below).

$log->add_group( [ $field1, $field, ...], [$threshold1, $threshold2, ... ]

Collect summary data about two or more fields grouped together. The columns must have previously been defined either by using add_field or else register_field.

For each group added to the report summary data will be collected for each unique combination of entries in the fields. For example, if a group is defined with "transaction" and "status", then summary information will be collected about each transaction broken down by the transaction status.

Note that a group for "transaction","status" is slightly different from "status","transaction". The former builds a data structure for each transaction that contains a hash with the summary data for each status. The latter builds a data structure for each status that contains a hash with the summary data for each transaction. Dumping the two data structure to xml using XML::Simple will result in different output. For more readable output, it is generally recommended that you use the field which has the least number of possible unique values first.

Thresholds will only be honored if a duration field has been defined in the log (see THRESHOLDS below).

$log->add_time_regexp( $regexp )

Define a regular expression which can be used to parse the time field. The regular expression should capture time to the resolution at which data should be collected. If you are parsing a log with many days data, you may want to generate a report which summarized by each day. On the other hand, if your log contains many transactions over a short time period, you might want to break down the summary by activity per second.

$log->add_line_regexp( $regexp )

Define a regular expression which can be used to parse the entire log entry and divide it up into a series of fields. This only needs to be defined if the entries are not single-line comma delimited.

$log->parse_text( $text )

Generate summary data about the log entries contained in $text.

If no fields or groups have been defined, only overall total data will be collected.

$log->parse_line( $line )

Similar to parse_text, except that only a single log entry is passed.

$log->add_filter_regexp( $regexp )

Add a regular expression filter. Any log entries that do not match the specified regular expression will not be processed.

$log->save_data( $file )

Save the data collected to the specified file. Data will be stored in the YAML format.

$log->read_data( $file )

Load the data collected from the specified store file. Data can been stored using save_data.

$log->get_utime_from_string( $string )

Given a plain text date string from a log, convert it to unix time. Memoized to reduce the overhead of using Date::Manip.

$log->get_xml()

Get XML report for log entries that have been processed.

$log->set_debug_nullvalues()

When this flag is set, any log entries containing a null value in any tracked fields will be printed to stderr.

Example XML Output

Top

Here are some examples of the XML generated by Log::Statistics:

    # time field and duration field defined

    <?xml version="1.0" standalone="yes"?>
    <log-statistics>
      <fields name="time">
        <time name="2006/01/05 00:01" count="7" duration="1039" />
        <time name="2006/01/05 00:02" count="1" duration="129" />
        <time name="2006/01/05 00:03" count="7" duration="991" />
        <time name="2006/01/05 00:04" count="11" duration="1457" />
        <time name="2006/01/05 00:05" count="9" duration="2507" />
        <time name="2006/01/05 00:06" count="7" duration="1059" />
        <time name="2006/01/05 00:07" count="7" duration="1100" />
      </fields>
    </log-statistics>

    # group of status:transaction

    <?xml version="1.0" standalone="yes"?>
    <log-statistics>
      <xrefs name="status-transaction">
        <status name="BAD">
          <transaction name="mytrans1" count="3" duration="9589" />
        </status>
        <status name="GOOD">
          <transaction name="mytrans1" count="200" duration="880" />
          <transaction name="mytrans2" count="122" duration="187" />
        </status>
      </xrefs>
    </log-statistics>




THRESHOLDS

Top

Thresholds allow monitoring the number of long response times. For example, a given transaction might be expected to be complete within 5 seconds. In addition to measuring the average response time of the transaction, you may also wish to measure how many transactions are not completed within 5 seconds. You may define any number of thresholds, so you could measure those that you consider to be fast (under 3 seconds), good (under 5 seconds), slow (over 10 seconds), and very slow (over 20 seoncds).

NOTE: If a duration field was not defined, then response times thresholds statistics can not be calculated.

DEPENDENCIES

Top

YAML - back end storage for log summary data

Date::Manip - for converting log times to unix time.

SEE ALSO

Top

http://www.geekfarm.org/wu/muse/LogStatistics.html

http://en.wikipedia.org/wiki/Pivot_table

http://en.wikipedia.org/wiki/Crosstab

BUGS AND LIMITATIONS

Top

Specifying a duplicate field or group definition will cause all values for the duplicated group(s) to be counted twice.

Please report problems to VVu@geekfarm.org

Patches are welcome.

AUTHOR

Top

VVu@geekfarm.org

LICENCE AND COPYRIGHT

Top


Log-Statistics documentation  | view source Contained in the Log-Statistics distribution.