SWISH::Prog::Aggregator::MailFS - crawl a filesystem of email messages


SWISH-Prog documentation Contained in the SWISH-Prog distribution.

Index


Code Index:

NAME

Top

SWISH::Prog::Aggregator::MailFS - crawl a filesystem of email messages

SYNOPSIS

Top

 use SWISH::Prog::Aggregator::MailFS;
 my $fs = SWISH::Prog::Aggregator::MailFS->new(
        indexer => SWISH::Prog::Indexer->new
    );

 $fs->indexer->start;
 $fs->crawl( $path_to_mail );
 $fs->indexer->finish;

DESCRIPTION

Top

SWISH::Prog::Aggregator::MailFS is a subclass of SWISH::Prog::Aggregator::FS that expects every file in a filesystem to be an email message. This class is useful for crawling a file tree like those managed by ezmlm.

NOTE: This class will not work with personal email boxes in the Mbox format. It might work with maildir format, but that is coincidental. Use SWISH::Prog::Aggregator::Mail to handle your personal email box. Use this class to handle mail archives as with a mailing list.

METHODS

Top

See SWISH::Prog::Aggregator::FS. Only new or overridden methods are documented here.

init

Constructor.

file_ok( full_path )

Like the parent class method, but ignores file extension, assuming that all files are email messages.

Returns the full_path value if the file is ok for indexing; returns 0 if not ok.

get_doc( url )

Overrides parent class to delegate the creation of the SWISH::Prog::Doc object to SWISH::Prog::Aggregator::Mail->get_doc().

Returns a SWISH::Prog::Doc object.

AUTHOR

Top

Peter Karman, <perl@peknet.com>

BUGS

Top

Please report any bugs or feature requests to bug-swish-prog at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=SWISH-Prog. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

Top

You can find documentation for this module with the perldoc command.

    perldoc SWISH::Prog




You can also look for information at:

* Mailing list

http://lists.swish-e.org/listinfo/users

* RT: CPAN's request tracker

http://rt.cpan.org/NoAuth/Bugs.html?Dist=SWISH-Prog

* AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/SWISH-Prog

* CPAN Ratings

http://cpanratings.perl.org/d/SWISH-Prog

* Search CPAN

http://search.cpan.org/dist/SWISH-Prog/

COPYRIGHT AND LICENSE

Top

SEE ALSO

Top

http://swish-e.org/


SWISH-Prog documentation Contained in the SWISH-Prog distribution.
package SWISH::Prog::Aggregator::MailFS;
use strict;
use warnings;
use base qw( SWISH::Prog::Aggregator::FS );
use Path::Class ();
use SWISH::Prog::Aggregator::Mail;    # delegate doc creation
use Carp;
use Data::Dump qw( dump );

our $VERSION = '0.51';

sub init {
    my $self = shift;
    $self->SUPER::init(@_);

    # cache a Mail aggregator to use its get_doc method
    $self->{_mailer} = SWISH::Prog::Aggregator::Mail->new(
        indexer => $self->indexer,
        verbose => $self->verbose,
        debug   => $self->debug,
    );

    return $self;
}

sub file_ok {
    my $self      = shift;
    my $full_path = shift;
    my $stat      = shift;

    $self->debug and warn "checking file $full_path\n";

    return 0 if $full_path =~ m![\\/](\.svn|RCS)[\\/]!; # TODO configure this.

    $stat ||= [ stat($full_path) ];
    return 0 unless -r _;
    return 0 if -d _;
    if (    $self->ok_if_newer_than
        and $self->ok_if_newer_than >= $stat->[9] )
    {
        return 0;
    }
    return 0
        if ( $self->_apply_file_rules($full_path)
        && !$self->_apply_file_match($full_path) );

    $self->debug and warn "  $full_path -> ok\n";
    if ( $self->verbose & 4 ) {
        local $| = 1;    # don't buffer
        print "crawling $full_path\n";
    }

    return $full_path;
}

sub get_doc {
    my $self = shift;

    # there's some wasted overhead here in creating a
    # SWISH::Prog::Doc 2x. But we're optimizing here for
    # developer time...

    # mostly a slurp convenience
    my $doc = $self->SUPER::get_doc(@_);

    #carp "first pass for raw doc: " . dump($doc);

    # get the "folder"
    my $folder = Path::Class::file( $doc->url )->dir;

    # now convert the buffer to an email message
    my $msg = Mail::Message->read( \$doc->content );

    # and finally convert to the SWISH::Prog::Doc we intend to return
    my $mail = $self->{_mailer}->get_doc( $folder, $msg );
    
    # reinstate original url from filesystem
    $mail->url($doc->url);

    #carp "second pass for mail doc: " . dump($mail);

    return $mail;
}

1;

__END__