SWISH::Filters::xls2txt - convert Excel docs to text using xls2csv


SWISH-Filter documentation Contained in the SWISH-Filter distribution.

Index


Code Index:

NAME

Top

SWISH::Filters::xls2txt - convert Excel docs to text using xls2csv

DESCRIPTION

Top

This is a plug-in module that uses the xls2csv program to convert MS Excel documents to text for indexing by Swish-e. xls2csv is part of the catdoc package and can be downloaded from:

    http://www.45.free.net/~vitus/software/catdoc/

The program xls2csv must be installed and in your PATH.

BUGS

Top

This filter does not specify input or output character encodings.

A minor optimization during spidering (i.e. when docs are in memory instead of on disk) would be to use open2() call to let catdoc read from stdin instead of from a file.

AUTHOR

Top

Peter Karman perl@peknet.com

SEE ALSO

Top

SWISH::Filter


SWISH-Filter documentation Contained in the SWISH-Filter distribution.

package SWISH::Filters::xls2txt;
use strict;
use vars qw( $VERSION @ISA );
$VERSION = '0.15';
@ISA = ('SWISH::Filters::Base');

sub new {
    my $class = shift;
    my $self  = bless {
        mimetypes => [ qr!application/vnd.ms-excel!, qr!application/excel!, ],
        priority  => 55,                             # higher than XLtoHTML
    }, $class;

    # check for helpers
    return $self->set_programs('xls2csv');

}

sub filter {
    my ( $self, $doc ) = @_;

    my $content = $self->run_xls2csv( $doc->fetch_filename ) || return;

    # update the document's content type
    $doc->set_content_type('text/plain');

    # return the document
    return \$content;
}
1;

__END__