XML::RSS::Aggregate - RSS Aggregator


XML-RSS-Aggregate documentation Contained in the XML-RSS-Aggregate distribution.

Index


Code Index:

NAME

Top

XML::RSS::Aggregate - RSS Aggregator

SYNOPSIS

Top

    my $rss = XML::RSS::Aggregate->new(
        # parameters for XML::RSS->channel()
        title   => 'Aggregated Examples',
        link    => 'http://blog.elixus.org/',

        # parameters for XML::RSS::Aggregate->aggregate()
        sources => [ qw(
            http://one.example.com/index.rdf
            http://another.example.com/index.rdf
            http://etc.example.com/index.rdf
        ) ],
        sort_by => sub {
            $_[0]->{dc}{subject}    # default to sort by dc:date
        },
        uniq_by => sub {
            $_[0]->{title}          # default to uniq by link
        }
    );

    $rss->aggregate( sources => [ ... ] );  # more items
    $rss->save("all.rdf");

DESCRIPTION

Top

This module implements a subclass of XML::RSS, adding a single aggregate method that fetches other RSS feeds and add to the object itself. It handles the proper ordering and duplication removal for aggregated links.

Also, the constructor new is modified to take arguments to pass implicitly to channel and aggregate methods.

All the base methods are still applicable to this module; please see XML::RSS for details.

METHODS

Top

aggregate (sources=>\@url, sort_by=>\&func, uniq_by=>\&func)

This method fetches all RSS feeds listed in @url and pass their items to the object's add_item.

The optional sort_by argument specifies the function to use for ordering RSS items; it defaults to sort them by their {dc}{date} attribute (converted to absolute timestamps), with ties broken by their {link} attribute.

The optional uniq_by argument specifies the function to use for removing duplicate RSS items; it defaults to remove items that has the same {link} value.

SEE ALSO

Top

XML::RSS

AUTHORS

Top

Autrijus Tang <autrijus@autrijus.org>

COPYRIGHT

Top


XML-RSS-Aggregate documentation Contained in the XML-RSS-Aggregate distribution.
# $File: //member/autrijus/XML-RSS-Aggregate/lib/XML/RSS/Aggregate.pm $ $Author: autrijus $
# $Revision: #4 $ $Change: 2924 $ $DateTime: 2002/12/25 15:04:33 $

package XML::RSS::Aggregate;
$XML::RSS::Aggregate::VERSION = '0.02';

use strict;
use XML::RSS;
use base 'XML::RSS';

use Date::Parse;
use LWP::Simple 'get';
use HTML::Entities 'encode_entities';

sub new {
    my ($class, %args) = @_;

    my $version = delete($args{version}) || '1.0';
    my $self    = $class->SUPER::new( version => $version );

    my $sources = delete($args{sources});
    my $sort_by = delete($args{sort_by});

    $self->channel(%args) if %args;
    $self->aggregate(
        sources => $sources,
        sort_by => $sort_by,
    ) if $sources;

    return $self;
}

sub aggregate {
    my ($self, %args) = @_;

    my $sources = $args{sources} or return;
    my $sort_by = $args{sort_by} || sub {
        my $date = $_[0]->{dc}{date};
        $date =~ s/:(\d\d)$/$1/ if $date;
        sprintf("%20s", str2time($date)).$_[0]->{link}
    };
    my $uniq_by = $args{uniq_by} || sub {
        $_[0]->{link}
    };

    my $old_items = $self->{items} || [];
    $self->{items} = [];

    my %saw;
    $self->add_item(%{$_->[0]}) for
        sort { $b->[1] cmp $a->[1] }
        grep { $_->[1] }
        map  { [ $_ => scalar($sort_by->($_)) ] }
        grep { !$saw{$uniq_by->($_)}++ } @{$old_items},
        map  { encode_entities($_, '&<>') for grep {!ref($_)} values %{$_}; $_ }
        map  { encode_entities($_, '&<>') for grep {!ref($_)} values %{$_->{dc}}; $_ }
        map  { encode_entities($_, '&<>') for grep {!ref($_)} values %{$_->{syn}}; $_ }
        map  { encode_entities($_, '&<>') for grep {!ref($_)} @{$_->{taxo}}; $_ }
        map  { eval { (my $rss = XML::RSS->new)->parse(get($_)); @{$rss->{items}} } }
        grep { /^\w+:/ } @{$sources};

    return $self;
}

1;

__END__
# Local variables:
# c-indentation-style: bsd
# c-basic-offset: 4
# indent-tabs-mode: nil
# End:
# vim: expandtab shiftwidth=4: