| File-Bidirectional documentation | view source | Contained in the File-Bidirectional distribution. |
File::Bidirectional - Read a file line-by-line either forwards or backwards
use File::Bidirectional;
my $file = "/var/log/large_file";
# Object interface
# start from the last line
my $fh = File::Bidirectional->new($file, {origin => -1})
or die $!;
# read backwards until point of interest
while (my $line = $fh->readline()) {
last if $line =~ /RECORD_START/;
}
# switch directions
$fh->switch();
# read forwards until point of interest
while (my $line = $fh->readline()) {
last if $line =~ /RECORD_END/;
}
# Tied Handle Interface
local *F;
tie *F, "File::Bidirectional", $file, {origin => 1}
or die $!;
while (my $line = <F>) { ... }
(tied *F)->switch();
File::Bidirectional reads a file line-by-line in either the forwards or backwards direction. It supports an object interface as well as a tied filehandle interface, and should be straight-forward to use. It is also memory efficient, since it is intended to be used on files too large to be efficiently slurped into an array and traversed backwards.
The direction in which to traverse the file can be changed at anytime, but it is
important to note that the last-read line will be repeated when this happens.
See line_num to see why this is so.
On non-Unix platforms, this module attempts to immitate native Perl in converting the line endings. Currently, this is limited and untested, so please see LINE ENDINGS for more information.
I had a diff file describing the changes in a large (> 200MB) file. Based on
the line numbers in the diff, I have to repeatedly read backwards and
forwards in the large file to obtain the context lines before and after the
diff changes. The number of context lines vary, thus it was a little more
involved than regenerating the diff with an appropriate --context option.
I decided to publish this module as I thought others might have similar needs. Reading large log files backwards is probably the most common of these, but if you have any other interesting uses, do let me know.
$fh = File::Bidirectional->new($file);
$fh = File::Bidirectional->new($file, {mode => 'forward'});
$fh = File::Bidirectional->new($file, {mode => 'backward'});
$fh = File::Bidirectional->new($file, {origin => -1});
$fh = File::Bidirectional->new($file, \%option);
Has the file name as the first parameter, and a hashref of options as an
optional second parameter. Upon success, it will return the object. For invalid
parameters, it will Carp/croak. For sysopen in perlfunc errors, it returns
undef and sets the error code in $! in perlvar.
The list of valid options are:
Can be either bi (bi-directional), forward or backward. The forward
and backward modes are restrictive: the file is read from the first and last
line respectively, and switching directions is prohibited. The bi mode
allows direction switching, and will start from the first line by default (use
the origin option to change that.) The default is bi.
Can be either 1 or -1. These denote whether the first or last line of the
file is considered as line 1 by line_num. (readline will always start
from line 1.) origin can only be set if the mode option is bi. The
default is 1.
Can be any true or false expression. It is analogous to the binmode in perlfunc
built-in function. On systems that distinguish between binary and text files,
notably DOS and Windows-based systems, this is important. A true value will
preserve \r\n as is; a false value will convert \r\n to \n. The
default is false.
Can be any scalar string. It is analogous to the "$/" in perlvar variable.
separator determines File::Bidirectional's notion of what a line is. The
default is "$/" in perlvar, which in turn defaults to "\n".
Caveat: The Perl-ish magic that occurs when "$/" in perlvar is "" does not
happen yet.
Can be any true or false expression. It determines whether the separator
option is a regex or a string. The default is false.
Can be any positive integer. This is the size of a single block read by the underlying sysread in perlfunc. The default is 8192.
while (my $line = $fh->readline()) { ... }
Returns the subsequent line. This refers either to the next line when the
direction is forwards, or to the previous line when the direction is backwards.
The direction can be changed with switch. undef is returned when there
are no more lines to be read.
An alias for readline. It exists for compatability with the IO::* classes.
Returns true when readline will return an undef, false otherwise.
$fh->switch();
Switches the current direction in which we are reading the file. It will
croak in Carp if the mode option in the constructor is set to forward or
backward.
Note that switching directions will cause the last-read line to be repeated by
readline.
$fh->close();
Closes the underlying filehandle and releases the memory allocated for its
buffer. On success it returns true, otherwise it returns false with the error
code found in $! in perlvar. All subsequent readline calls will return
undef, and line_num, its last value.
Takes an optional parameter: 1 for reading forwards, -1 for reading backwards, croak in Carp otherwise. If an argument for the parameter is provided, the direction will be switched if necessary. Either way, it returns the (new) direction.
my $fh=File::Bidirectional->new($file); n=$fh->line_num(); # n = 0
$fh->readline(); n=$fh->line_num(); # n = 1
$fh->readline(); n=$fh->line_num(); # n = 2
$fh->switch(); n=$fh->line_num(); # n = 2
$fh->readline(); n=$fh->line_num(); # n = 1
$fh->readline(); n=$fh->line_num(); # n = 0
Returns the current line number. It is analogous to $. in perlvar.
For a file with n logical lines, the line number ranges from 0 to n. When reading away from the origin (forwards if the first line is the origin), its behavior is always identical to that of $. in perlvar - it refers to the number of lines that has been read. When reading towards the origin, it refers to the number of lines that can still be read.
When switch is called, the direction is changed, but the line number
remains the same. Therefore, the last-read line before changing directions will
be repeated by readline.
Returns the current position of the filehandle.
Returns the underlying filehandle. This is mainly useful for file-locking.
Notice that this actually breaks the encapsulation of File::Bidirectional, therefore it becomes the user's responsibility to ensure that nothing bad happens to the underlying filehandle. For example, it should definitely not be closed.
The underlying filehandle will be returned with its seek position set to what is
returned by tell. It should generally be okay for this seek position to be
modified (the object remembers its own seek position and will always restore
it). Any other operations on the filehandle, however, is very likely to void
your warranty. =)
local *F;
tie *F, "File::Bidirectional", $file, {origin => 1}
or die $!;
while (my $line = <F>) { ... }
(tied *F)->switch();
The TIEHANDLE, READLINE, EOF, CLOSE and TELL are aliased to the
constructor and the lower-case method names, respectively. All other tied
operations, such as seeking and writing, are unsupported and will generate an
unknown method area.
To use the other methods, it is necessary to get at the reference to the object underlying the tied variable via tied in perlfunc.
Currently, File::Bidirectional attempts to imitate Perl by converting the
platform-specific line separator into \n. Currently, this only means
converting \r on MacOS, and \r\n on DOS and Windows-type systems (when the
binmode option is not set).
So far, this module has only been tested on Unix where line endings do not need to be converted, thus it will be greatly appreciated if users can feedback whether the line endings conversion work on their respective platforms.
As would be expected, File::Bidirectional is hardly as fast as native Perl I/O. To break the news gently, it can be up to an order of magnitude slower...
Reading through a 250MB file with various methods yield the following numbers:
Method | Time (s)
--------------------------------------
Native Perl | 5
IO::File | 16
File::Bidirectional (OO) | 42
File::Bidirectional (tied) | 51
To be optimistic about it, in the best case File::Bidirectional takes 2.6 times the time taken for IO::File. For smaller files, the absolute time difference may be less noticeable, so you will have to decide if the tradeoff is worth it for your application. It is about as fast as I can make it without dropping down into C, but if anybody has a compelling need for speed or ideas on how to optimize things, please do drop me a line.
The benchmarks were performed circa 2005, on a Pentium-4 machine with clockspeed 2.8GHz, a 7200rpm IDE harddisk, running Debian sarge and ext3. The programs tested were the respective variants of
while (my $line = <$fh>) { chomp $line; }
The record separator was simply "\n" and no newline translation took place.
Kian Win Ong, cpan@bulk.squeakyblue.com
Copyright (C) 2005 by Kian Win Ong. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. This can be either the GNU General Public License or the Artistic License, as specified in the Perl README file.
Thanks goes out to Uri Guttman, the author of File::ReadBackwards, from which I stole a bunch of code and tests. =)
| File-Bidirectional documentation | view source | Contained in the File-Bidirectional distribution. |