| Parse-IRCLog documentation | Contained in the Parse-IRCLog distribution. |
Parse::IRCLog - parse internet relay chat logs
version 1.104
use Parse::IRCLog;
$result = Parse::IRCLog->parse("perl-2004-02-01.log");
my %to_print = ( msg => 1, action => 1 );
for ($result->events) {
next unless $to_print{ $_->{type} };
print "$_->{nick}: $_->{text}\n";
}
This module provides a simple framework to parse IRC logs in arbitrary formats.
A parser has a set of regular expressions for matching different events that occur in an IRC log, such as "msg" and "action" events. Each line in the log is matched against these rules and a result object, representing the event stream, is returned.
The rule set, described in greated detail below, can be customized by subclassing Parse::IRCLog. In this way, Parse::IRCLog can provide a generic interface for log analysis across many log formats, including custom formats.
Normally, the parse method is used to create a result set without storing a
parser object, but a parser may be created and reused.
newThis method constructs a new parser (with <$class-construct>>) and
initializes it (with <$obj-init>>). Construction and initialization are
separated for ease of subclassing initialization for future pipe dreams like
guessing what ruleset to use.
constructThe parser constructor just returns a new, empty parser object. It should be a blessed hashref.
initThe initialization method configures the object, loading its ruleset.
patternsThis method returns a reference to a hash of regular expressions, which are used to parse the logs. Only a few, so far, are required by the parser, although internally a few more are used to break down the task of parsing lines.
action matches an action; that is, the result of /ME in IRC. It should
return the following matches:
$1 - timestamp $2 - nick prefix $3 - nick $4 - the action
msg matches a message; that is, the result of /MSG (or "normal talking") in
IRC. It should return the following matches:
$1 - timestamp $2 - nick prefix $3 - nick $3 - channel $5 - the action
Read the source for a better idea as to how these regexps break down. Oh, and for what it's worth, the default patterns are based on my boring, default irssi configuration. Expect more rulesets to be included in future distributions.
parse($file)This method parses the file named and returns a Parse::IRCLog::Result object
representing the results. The parse method can be called on a parser object
or on the class. If called on the class, a parser will be instantiated for the
method call and discarded when parse returns.
parse_line($line)This method is used internally by parse to turn each line into an event.
While it could someday be made slick, it's adequate for now. It attempts to
match each line against the required patterns from the patterns result and
if successful returns a hashref describing the event.
If no match can be found, an "unknown" event is returned.
Write a few example subclasses for common log formats.
Add a few more default event types: join, part, nick. Others?
Possibly make the patterns sub an module, to allow subclassing to override
only one or two patterns. For example, to use the default nick pattern but
override the nick_container or action_leader. This sounds like a very
good idea, actually, now that I write it down.
Ricardo SIGNES <rjbs@cpan.org>
Copyright 2004 by Ricardo Signes.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
| Parse-IRCLog documentation | Contained in the Parse-IRCLog distribution. |
use strict; use warnings; package Parse::IRCLog; use Carp (); use Parse::IRCLog::Result; use Symbol ();
our $VERSION = '1.104';
sub new { my $class = shift; Carp::croak "new is a class method" if ref $class; $class->construct->init; }
sub construct { bless {} => shift; }
sub init { my $self = shift; $self->{patterns} = $self->patterns; $self; }
sub patterns { return $_[0]{patterns} if ref $_[0] and defined $_[0]{patterns}; my $p; # nick and chan are (mostly) specified in RFC2812, section 2.3.1 my $letter = qr/[\x41-\x5A\x61-\x7A]/; # A-Z / a-z my $digit = qr/[\x30-\x39]/; # 0-9 my $special = qr/[\x5B-\x60\x7B-\x7D]/; # [\]^_`{|} $p->{nick} = qr/( (?: $letter | $special ) (?: $letter | $digit | $special | - )* )/x; my $channelid = qr/[A-Z0-9]{5}/; my $chanstring = qr/[^\x00\a\r\n ,:]*/; $p->{chan} = qr/( (?: \# | \+ | !$channelid | & ) $chanstring (?: :$chanstring )? )/x; # the other regexes are more relevant to the way irssi formats logs $p->{nick_container} = qr/ < \s* ([+%@])? \s* $p->{nick} (?: : $p->{chan} )? \s* > /x; $p->{timestamp} = qr/\[?(\d\d:\d\d(?::\d\d)?)?\]?/; $p->{action_leader} = qr/\*/; $p->{msg} = qr/ $p->{timestamp} \s* $p->{nick_container} \s+ (.+) /x; $p->{action} = qr/ $p->{timestamp} \s* $p->{action_leader} \s+ ([%@])? \s* $p->{nick} \s (.+) /x; $p; }
sub parse { my $self = shift; $self = $self->new unless ref $self; my $symbol = Symbol::gensym; open $symbol, "<", $_[0] or Carp::croak "couldn't open $_[0]: $!"; my @events; push @events, $self->parse_line($_) while (<$symbol>); Parse::IRCLog::Result->new(@events); }
sub parse_line { my ($self, $line) = @_; if ($line) { return { type => 'msg', timestamp => $1, nick_prefix => $2, nick => $3, text => $5 } if $line =~ $self->patterns->{msg}; return { type => 'action', timestamp => $1, nick_prefix => $2, nick => $3, text => $4 } if $line =~ $self->patterns->{action}; } return { type => 'unknown', text => $line }; }
1;