| News-Overview documentation | Contained in the News-Overview distribution. |
News::Overview - an object to store condensed information about Usenet posts
use News::Overview;
use Net::NNTP;
my $overview = News::Overview->new();
my $nntp = new Net::NNTP;
$nntp->group("killfile.test");
$overview->add_from_nntp($nntp->xover);
foreach my $entry ( $overview->sort ('thread', $overview->entries) ) {
print $overview->print_entry($entry), "\n";
}
News::Overview objects store combined information about many messages, as generally done in INN's overview format and semi-codified in the XOVER extentions to RFC1036. Each object is meant to store a single newsgroup's worth of basic header information - by default the message number, subject, poster, date of posting, message identifier, references to the article's parents, size of the body, number of lines in the body, and information on where this message is stored within the server. This information is then used to offer summartes of messages in the group, sort the messages, and so forth.
The main unit of storage within News::Overview is the object News::Overview::Entry; each one of these contains information on a single article. News::Overview itself is dedicated to creating, storing, and manipulating these Entries.
All of this object's usage is contained within its functions.
Creates a new News::Overview object.
If DEFAULT_ARRAY_REF is offered, we will use this to define which
fields are stored in all the associated Entries; otherwise, we default to
the fields in @News::Overview::DEFAULT. The 'Number:' field is added
as well, to store the "article number" that each entry is associated with.
Returns the new blessed object, or undef if unsuccessful.
In array context, returns the full list of default information associated with each Entry. In scalar context, returns the same as an arrayref.
Same as default(), except this information is instead based on @News::Overview::DEFAULT (ie doesn't include Number:).
In array context, returns the list of fields stored in each associated Entry. In scalar context, returns this as an arrayref.
This differs from default() only in as much as everything after the ':' is trimmed; these are meant to be used as
Returns the (unsorted) array of News::Overview::Entry objects within the object.
Returns the number of News::Overview::Entry objects associated with this object.
These functions add new News::Overview::Entry items to the object, as parsed from several sources.
Actually does the work of inserting an Entry into the object. NUMBER
is the article number, which is used as they key for this Entry;
INFOARRAY is the list of information necessary for each Entry, sorted
by whatever function called this one.
Returns undef if there's already an entry matching the given NUMBER,
otherwise returns the new entry.
Reads in raw xover LINES (such as those created by print()) and creates
entries for each, using insert_entry(). Returns the number of Entries
that were succesfully added.
Reads in the information returned by Net::NNTP's xover() function, and and creates entries for each, using insert_entry(). Returns the number of Entries that were succesfully added.
Takes ARTICLE, a News::Article object, and generates the necessary
information to populate an Entry from it. NUMBER is the key that will
be associated with the article; we need it separately because we can't
really get it from the article directly. Returns 1 if successful, 0 if
not (roughly the same as add_xover() and add_from_nntp()).
These functions are used to sort the Entries within the News::Overview object.
Sort array ENTRIES based on SORTTYPE. Possible sorting types (case
insensitive):
thread Uses thread() to sort the messages date Sort (numerically) by the article time time Sort (numerically) by the article time lines Sort (numerically) by lines, then by time (other) Sort (with 'cmp') based on the value of the specified field, ie sort by 'From' or 'Subject', then by time
If SORTTYPE is prefixed with a '-', then we will return the entries in
revere order.
Returns the sorted array.
Sort ENTRIES by thread - that is, with articles that directly follow up
to a given article following the first article. The general algorithm:
Sort ENTRIES by depth and time of posting. For each entry, return the entry and its sorted children. No article is returned twice.
This doesn't quite work the way you'd expect it to; if the original parent
isn't there, any number of children may appear elsewhere, because there
was no common parent ENTRY to hold things together. The only solution
I can see is to look at parents as well, sorting them but not printing
them, which isn't currently being done; I may do this in a future version
of this package.
This function is fairly computationally intensive. It might be nice to cache this information somehow in some applications; I suspect that this would be a job for a different module, however. There's probably also some computational cruft that I haven't looked for yet.
These functions perform functions similar to those requested by Net::NNTP, and are therefore useful for creating modules dedicated to getting this information in other ways.
Returns an array reference to the field names, in order, that are stored in the Entries.
Returns a hash reference where the keys are the message numbers and the
values are array references containing the overview fields for that
message. MESSAGESPEC is parsed with Net::NNTP::Function's
messagespec() function to decide wich articles to get; FIELDS is an
array of fields to retrieve, which (if not offered) will default to the
value of fields().
We aren't currently dealing with the response if MESSAGESPEC is a
message-ID (or empty); we're assuming that it's just numbers. This is
wrong.
These functions offer printable versions of the overview information, which can be used for long-term storage.
Makes a printable version of all of the Entries in the object. Sorts the
entries based on SORT; FIELDS describes which fields to output;
defaults to fields(). The saved fields are separated with tabs, with all
other whitespace trimmed. This is suitable for saving out to a file and
later reading back in with add_xover().
Returns an array of lines of text containing the information in array context, or in scalar context returns these lines joined with newlines.
Print a specific entry's worth of information, as described above.
News::Overview::Entry, News::Article, Net::NNTP::Functions
News::Overview::Entry, News::Article, Net::NNTP::Functions
This was originally designed to be used with News::Archive and kiboze.pl; it eventually got dragged into News::Web as well, and so it became worth making into a separate function. It also didn't quite fit into my newslib project, since it might be worth optimizing this specifically in the future. Aah, well.
We should build xhdr(), xpat(), and other Net::NNTP functions into here, just like xover() and overview_fmt().
It would be nice if there was a way to say "return 500 entries" in an xover-type context, instead of "return 1 through 500"; sadly, since Net::NNTP->xover() doesn't have this, so I'll have to work out some other way to implement it.
We should be able to limit what we're returning in some more logical manner, ie with an SQL-type select() function - "return all entries posted between x and y dates", or "return all entries posted by user z", or whatever.
Tim Skirvin <tskirvin@killfile.org>
Copyright 2003 by Tim Skirvin <tskirvin@killfile.org>. This code may be distributed under the same terms as Perl itself.
| News-Overview documentation | Contained in the News-Overview distribution. |
$VERSION = "0.12"; package News::Overview; our $VERSION = "0.12"; # -*- Perl -*- # Fri Oct 10 11:29:51 CDT 2003 ############################################################################# # Written by Tim Skirvin <tskirvin@killfile.org>. Copyright 2003, Tim # Skirvin. Redistribution terms are below. #############################################################################
############################################################################### ### main() #################################################################### ############################################################################### use strict; use News::Article; use News::Overview::Entry; use Net::NNTP::Functions; use vars qw( @DEFAULT ); @DEFAULT = qw( Subject: From: Date: Message-ID: References: Bytes: Lines: Xref:full );
sub new { my ($proto, $default) = @_; $default ||= ""; my $class = ref($proto) || $proto; my $self = { Defaults => [ ref $default ? @{$default} : @DEFAULT ], # Count => 0, # Number of articles currently in here Articles => {}, # Actual article information Article_By_ID => {}, # Actual article information }; unshift @{$$self{Defaults}}, "Number:"; $$self{'Fields'} = [ _fields($$self{'Defaults'}) ]; bless $self, $class; $self; }
sub default { wantarray ? @{shift->{'Defaults'}} : shift->{'Defaults'} ; } sub defaults { _fields(@DEFAULT) } sub fields { wantarray ? @{shift->{'Fields'}} : shift->{'Fields'} ; } # =item value ( KEY [, VALUE ]) # # Returns the ...hey, wait a second, we're not doing anything with this! # # =cut # # sub value { # my ($self, $key, $value) = @_; # return undef unless $key; # $self->values->{$key} = $value if defined $value; # $self->values->{$key}; # } # # =item values ( ) # # =cut # # sub values { shift->{'Values'} }
sub entries { values %{shift->{Articles}} }
sub count { scalar values %{shift->{Articles}} || 0 }
sub insert_entry { my ($self, @info) = @_; my %hash; foreach ($self->fields) { $hash{$_} = shift @info || ""; } my $msgid = $hash{'Message-ID'}; # Don't do anything more if there's already an entry for this return undef if $self->{'Articles_By_ID'}->{$msgid}; my @refs = split(/\s+/, $hash{'References'} || ""); my $item = new News::Overview::Entry($msgid, \@refs, %hash); foreach (@refs) { my $artbyid = $self->{'Article_By_ID'}->{$_} || undef; if ($artbyid) { $artbyid->add_child($item) } } $self->{'Article_By_ID'}->{$msgid} = $item; my $number = $hash{'Number'}; # Ought to abort if we don't have this $self->{'Articles'}->{$number} = $item; $item; }
sub add_xover { my ($self, @lines) = @_; my $count = 0; foreach my $line (@lines) { chomp; my ($art, @info) = split(/\t/, $line); $self->insert_entry( $art, @info ) and $count++ } $$self{Count} += $count; $count; }
sub add_from_nntp { my ($self, %lines) = @_; my $count = 0; foreach my $art (keys %lines) { next unless ref $lines{$art}; my @info = @{$lines{$art}}; $self->insert_entry( $art, @info ) and $count++ } $$self{Count} += $count; $count; }
sub add_from_article { my ($self, $num, $article) = @_; return undef unless ($num && $article && ref $article); my @info; my @defaults = ref $self ? $self->default : @DEFAULT; foreach my $field (@defaults) { $field =~ s/:.*//; next if $field eq 'Number'; # next unless $field; if (lc $field eq 'lines') { push @info, $article->lines; } elsif (lc $field eq 'bytes') { push @info, $article->bytes; } elsif ($article->header($field)) { push @info, $article->header($field); } else { push @info, '' } } $self->insert_entry( $num, @info ) ? 1 : 0; }
sub sort { my ($self, $sort, @entries) = @_; $sort ||= ""; my ($reverse, $type) = $sort =~ m/^(\-?)(.*)$/; $type ||= 'Number'; my @return; if (lc $type eq 'thread') { # thread them @return = $self->thread(@entries); } elsif (lc $type eq 'lines') { @return = sort { ( $a->values->{ucfirst lc $type} <=> $b->values->{ucfirst lc $type} ) || ( $a->time <=> $b->time ) } @entries; } elsif ( lc $type eq 'date' || lc $type eq 'time' ) { @return = sort { ( $a->time <=> $b->time ) } @entries; } elsif ( grep { lc $_ eq lc $type } $self->fields ) { @return = sort { ( $a->values->{ucfirst lc $type} cmp $b->values->{ucfirst lc $type} ) || ( $a->time <=> $b->time ) } @entries; } else { @return = sort { $a->values->{Number} <=> $b->values->{Number} } @entries; } $reverse ? reverse @return : @return; }
sub thread { my ($self, @entries) = @_; my %added; my @return; foreach my $ent ( sort News::Overview::_bythread @entries ) { next unless ref $ent; my $parent = $ent->id; push @return, $ent unless ($added{$ent->id}++); my @children = $ent->children; my @tosort; foreach (@children) { push @tosort, $_ unless $added{$_->id} } next unless @tosort; foreach my $item ( $self->thread(@tosort) ) { next unless ref $item; # my $item = $$self{Article_By_ID}->{$_}; push @return, $item unless ($added{$item->id}++); } } @return; }
sub overview_fmt { my ($self) = @_; ref $self ? $self->default : \@DEFAULT; }
sub xover { my ($self, $spec, @fields) = @_; @fields = $self->fields unless scalar @fields; my ($first, $last) = messagespec($spec); # my ($first, $last) = split('-', $spec); $first ||= 0; my %entries; foreach my $key (keys %{$self->{Articles}}) { next if $key < $first; next unless ($last > 0 && $key <= $last); my $entry = $$self{Articles}->{$key}; # Should be able to get the article by ID too next unless $entry; my @over; foreach (@fields) { next if $_ eq 'Number'; # Skip the 'Number' field push @over, $entry->values->{$_} } $entries{$key} = \@over; } \%entries; }
sub print { my ($self, $sort, @fields) = @_; @fields = $self->fields unless scalar @fields; my @return; my @entries = $self->sort($sort, values %{$self->{Articles}}); foreach my $art (@entries) { push @return, $self->print_entry($art, @fields); } wantarray ? @return : join("\n", @return); }
sub print_entry { my ($self, $entry, @fields) = @_; return "" unless ($entry && ref $entry); @fields = $self->fields unless scalar @fields; my @over; foreach (@fields) { push @over, $entry->value($_) } map { s/\s/ /g; } @over; # Trim all whitespace join("\t", @over); }
############################################################################### ### Internal Functions ######################################################## ############################################################################### ### _fields ( ${@heads} ) # Retuns canonical names for the fields header. Takes an arrayref, # returns an array. sub _fields { my $heads = shift; my @heads = @{$heads}; map { s/:.*//g; lc $_ } @heads; @heads; } ### _bythread ( $a, $b ) # Sort function, to do sorts by thread - this means depth first, then # number. The actual "do children first" part is in thread(). sub _bythread { $a->depth <=> $b->depth || $a->time <=> $b->time # $a->values->{Number} <=> $b->values->{Number} } ### _bythread_basic ( $a, $b ) # More basic, depth-only search. sub _bythread_basic { $a->depth <=> $b->depth }
1; ############################################################################### ### Version History ########################################################### ############################################################################### # v0.01b Fri Oct 10 11:32:39 CDT 2003 ### First commented version (above date indicates the start of the comments) # v0.10b Fri Oct 10 15:25:43 CDT 2003 ### Took out some unnecessary code where necessary. Made a print_entry() ### function. # v0.11b Fri Oct 10 15:36:45 CDT 2003 ### Very minor documentation changes # v0.12 Thu Apr 22 13:19:25 CDT 2004 ### No real changes; internal code layout.