HTML::StripScripts::Parser - XSS filter using HTML::Parser


HTML-StripScripts-Parser documentation Contained in the HTML-StripScripts-Parser distribution.

Index


Code Index:

NAME

Top

HTML::StripScripts::Parser - XSS filter using HTML::Parser

SYNOPSIS

Top

  use HTML::StripScripts::Parser();

  my $hss = HTML::StripScripts::Parser->new(

       {
           Context => 'Document',       ## HTML::StripScripts configuration
           Rules   => { ... },
       },

       strict_comment => 1,             ## HTML::Parser options
       strict_names   => 1,

  );

  $hss->parse_file("foo.html");

  print $hss->filtered_document;

  OR

  print $hss->filter_html($html);

DESCRIPTION

Top

This class provides an easy interface to HTML::StripScripts, using HTML::Parser to parse the HTML.

See HTML::Parser for details of how to customise how the raw HTML is parsed into tags, and HTML::StripScripts for details of how to customise the way those tags are filtered.

CONSTRUCTORS

Top

new ( {CONFIG}, [PARSER_OPTIONS] )

Creates a new HTML::StripScripts::Parser object.

The CONFIG parameter has the same semantics as the CONFIG parameter to the HTML::StripScripts constructor.

Any PARSER_OPTIONS supplied will be passed on to the HTML::Parser init method, allowing you to influence the way the input is parsed.

You cannot use PARSER_OPTIONS to set the HTML::Parser event handlers (see Events in HTML::Parser) since HTML::StripScripts::Parser uses all of the event hooks itself. However, you can use Rules (see Rules in HTML::StripScripts) to customise the handling of all tags and attributes.

METHODS

Top

See HTML::Parser for input methods, HTML::StripScripts for output methods.

filter_html()

filter_html() is a convenience method for filtering HTML already loaded into a scalar variable. It combines calls to HTML::Parser::parse(), HTML::Parser::eof() and HTML::StripScripts::filtered_document().

    $filtered_html = $hss->filter_html($html);




SUBCLASSING

Top

The HTML::StripScripts::Parser class is subclassable. Filter objects are plain hashes. The hss_init() method takes the same arguments as new(), and calls the initialization methods of both HTML::StripScripts and HTML::Parser.

See "SUBCLASSING" in HTML::StripScripts and "SUBCLASSING" in HTML::Parser.

SEE ALSO

Top

HTML::StripScripts, HTML::Parser, HTML::StripScripts::LibXML

BUGS

Top

None reported.

Please report any bugs or feature requests to bug-html-stripscripts-parser@rt.cpan.org, or through the web interface at http://rt.cpan.org.

AUTHOR

Top

Original author Nick Cleaton <nick@cleaton.net>

New code added and module maintained by Clinton Gormley <clint@traveljury.com>

COPYRIGHT

Top

LICENSE

Top

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.


HTML-StripScripts-Parser documentation Contained in the HTML-StripScripts-Parser distribution.
package HTML::StripScripts::Parser;
use strict;

use vars qw($VERSION);
$VERSION = '1.03';

use HTML::StripScripts;
use HTML::Parser;
use base qw(HTML::StripScripts HTML::Parser);

sub hss_init {
    my ( $self, $cfg, @parser_options ) = @_;

    $self->init(
        @parser_options,

        api_version      => 3,
        start_document_h => [ 'input_start_document', 'self' ],
        start_h          => [ 'input_start', 'self,text' ],
        end_h            => [ 'input_end', 'self,text' ],
        text_h           => [ 'input_text', 'self,text' ],
        default_h        => [ 'input_text', 'self,text' ],
        declaration_h    => [ 'input_declaration', 'self,text' ],
        comment_h        => [ 'input_comment', 'self,text' ],
        process_h        => [ 'input_process', 'self,text' ],
        end_document_h   => [ 'input_end_document', 'self' ],

        # workaround for http://rt.cpan.org/NoAuth/Bug.html?id=3954
        (  $HTML::Parser::VERSION =~ /^3\.(29|30|31)$/
           ? ( strict_comment => 1 )
           : ()
        ),
    );

    $self->SUPER::hss_init($cfg);
}

#===================================
sub filter_html {
#===================================
    my ( $self, $html ) = @_;
    $self->parse($html);
    $self->eof;
    return $self->filtered_document;
}

1;