SWISH::Prog::Doc - Document object class for passing to SWISH::Prog::Indexer


SWISH-Prog documentation Contained in the SWISH-Prog distribution.

Index


Code Index:

NAME

Top

SWISH::Prog::Doc - Document object class for passing to SWISH::Prog::Indexer

SYNOPSIS

Top

  # subclass SWISH::Prog::Doc
  # and override filter() method

  package MyDoc;
  use base qw( SWISH::Prog::Doc );

  sub filter {
    my $doc = shift;

    # alter url
    my $url = $doc->url;
    $url =~ s/my.foo.com/my.bar.org/;
    $doc->url( $url );

    # alter content
    my $buf = $doc->content;
    $buf =~ s/foo/bar/gi;
    $doc->content( $buf );
  }

  1;

DESCRIPTION

Top

SWISH::Prog::Doc is the base class for Doc objects in the SWISH::Prog framework. Doc objects are created by SWISH::Prog::Aggregator classes and processed by SWISH::Prog::Indexer classes.

You can subclass SWISH::Prog::Doc and add a filter() method to alter the values of the Doc object before it is indexed.

METHODS

Top

All of the following methods may be overridden when subclassing this module, but the recommendation is to override only filter().

new

Instantiate Doc object.

All of the following params are also available as accessors/mutators.

url
type
content
parser
modtime
size
action
debug
charset
data
version

Swish-e 2.x or Swish3 style headers. Value should be 2 or 3. Default is 2.

init

Calls filter() on object.

filter

Override this method to alter the values in the object prior to it being process()ed by the Indexer.

The default is to do nothing.

This method can also be set using the filter() callback in SWISH::Prog->new().

as_string

Return the Doc object rendered as a scalar string, ready to be indexed. This will include the proper headers. See SWISH::Prog::Headers.

NOTE: as_string() is also used if you use a Doc object as a string. Example:

 print $doc->as_string;     # one way
 print $doc;                # same thing

AUTHOR

Top

Peter Karman, <perl@peknet.com>

BUGS

Top

Please report any bugs or feature requests to bug-swish-prog at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=SWISH-Prog. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

Top

You can find documentation for this module with the perldoc command.

    perldoc SWISH::Prog




You can also look for information at:

* Mailing list

http://lists.swish-e.org/listinfo/users

* RT: CPAN's request tracker

http://rt.cpan.org/NoAuth/Bugs.html?Dist=SWISH-Prog

* AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/SWISH-Prog

* CPAN Ratings

http://cpanratings.perl.org/d/SWISH-Prog

* Search CPAN

http://search.cpan.org/dist/SWISH-Prog/

COPYRIGHT AND LICENSE

Top

SEE ALSO

Top

http://swish-e.org/


SWISH-Prog documentation Contained in the SWISH-Prog distribution.
package SWISH::Prog::Doc;
use strict;
use warnings;
use Carp;
use Data::Dump qw( dump );
use base qw( SWISH::Prog::Class );
use overload(
    '""'     => \&as_string,
    'bool'   => sub {1},
    fallback => 1,
);

use SWISH::Prog::Headers;

our $VERSION = '0.51';

__PACKAGE__->mk_accessors(
    qw( url modtime type parser content action size charset data version ));

my $default_version = $ENV{SWISH3} ? 3 : 2;

my ( $locale, $lang, $charset );
{

    # inside a block to reduce impact on any regex
    use POSIX qw(locale_h);
    use locale;

    $locale = setlocale(LC_CTYPE);
    ( $lang, $charset ) = split( m/\./, $locale );
    $charset ||= 'iso-8859-1';
}

sub init {
    my $self = shift;
    $self->SUPER::init(@_);
    $self->{charset} ||= $charset;
    $self->{version} ||= $default_version;
    $self->filter();
    return $self;
}

sub filter { }

# TODO cache this higher up? how else to set debug??
my $headers = SWISH::Prog::Headers->new();

sub as_string {
    my $self = shift;

    # we ignore size() and let Headers compute it based on actual content()
    return $headers->head(
        $self->content,
        {   url     => $self->url,
            modtime => $self->modtime,
            type    => $self->type,
            action  => $self->action,
            parser  => $self->parser,
            version => $self->version,
        }
    ) . $self->content;

}

1;

__END__