NAME

WWW::Blog::Metadata - Extract common metadata from weblogs

SYNOPSIS

        use WWW::Blog::Metadata;
        use Data::Dumper;
        my $uri;
        my $meta = WWW::Blog::Metadata->extract_from_uri($uri)
            or die WWW::Blog::Metadata->errstr;
        print Dumper $meta;

DESCRIPTION

WWW::Blog::Metadata extracts common metadata from weblogs: syndication feed URIs, FOAF URIs, locative information, etc. Some benefits of using WWW::Blog::Metadata:

USAGE
WWW::Blog::Metadata->extract_from_uri($uri) Given a URI $uri pointing to a weblog, fetches the page contents, and attempts to extract common metadata from that weblog.

On error, returns "undef", and the error message can be obtained by calling WWW::Blog::Metadata->errstr.

On success, returns a WWW::Blog::Metadata object.

WWW::Blog::Metadata->extract_from_html($html [, $base_uri ]) Uses the same extraction mechanism as extract_from_uri, but assumes that you've already fetched the HTML document and will provide it in $html, which should be a reference to a scalar containing the HTML.

If you know the base URI of the document, you should provide it in $base_uri. WWW::Blog::Metadata will attempt to find the base URI of the document if it's specified in the HTML itself, but you can give it a head start by passing in $base_uri.

This method has the same return value as extract_from_uri.

$meta->feeds
A reference to a hash of syndication feed URIs.

(Note: these are currently extracted using Feed::Find, which requires a separate parsing step, and sort of renders the above benefit #1 somewhat of a lie. This is done for maximum correctness, but it's possible it could change at some point.)

$meta->foaf_uri
The URI for a FOAF file, specified in the standard manner used for FOAF auto-discovery.

$meta->lat
$meta->lon
The latitude and longitude specified for the weblog, from either icbm or geo.position <meta /> tags.

$meta->generator
The tool that generated the weblog, from a generator <meta /> tag.

PLUGINS

There are endless amounts of metadata that you might want to extract from a weblog, and the methods above are only what are provided by default. If you'd like to extract more information, you can use WWW::Blog::Metadata's plugin architecture to build access to the metadata that you want, while while making only one parsing pass over the HTML document.

The plugin architecture uses Module::Pluggable::Ordered, and it provides 2 pluggable events:

            package WWW::Blog::Metadata::Flavor;
            use strict;

            use WWW::Blog::Identify qw( identify );
            use WWW::Blog::Metadata;
            WWW::Blog::Metadata->mk_accessors(qw( flavor ));

            sub on_got_html {
                my $class = shift;
                my($meta, $html, $base_uri) = @;
                $meta->flavor( identify($baseuri, $$html) );
            }
            sub on_got_html_order { 99 }

            1;

        This automatically adds a new accessor to the $meta object that is
        returned from the extract_from_* methods, so you can call

            my $meta = WWW::Blog::Metadata->extract_from_uri($uri);
            print $meta->flavor;

        to retrieve the name of the identified weblogging tool.
            package WWW::Blog::Metadata::RSD;
            use strict;

            use WWW::Blog::Metadata;
            WWW::Blog::Metadata->mk_accessors(qw( rsd_uri ));

            sub on_got_tag {
                my $class = shift;
                my($meta, $tag, $attr, $base_uri) = @;
                if ($tag eq 'link' && $attr->{rel} =~ /\bEditURI\b/i &&
                    $attr->{type} eq 'application/rsd+xml') {
                    $meta->rsduri(URI->new_abs($attr->{href}, $base_uri)->as_string);
                }
            }
            sub on_got_tag_order { 99 }

            1;

        This automatically adds a new accessor to the $meta object that is
        returned from the extract_from_* methods, so you can call

            my $meta = WWW::Blog::Metadata->extract_from_uri($uri);
            print $meta->rsd_uri;

        to retrieve the URI for the RSD file.

LICENSE

WWW::Blog::Metadata is free software; you may redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR & COPYRIGHT

Except where otherwise noted, WWW::Blog::Metadata is Copyright 2005 Benjamin Trott, ben+cpan@stupidfool.org. All rights reserved.