Syntax::Highlight::HTML - Highlight HTML syntax


Syntax-Highlight-HTML documentation Contained in the Syntax-Highlight-HTML distribution.

Index


Code Index:

NAME

Top

Syntax::Highlight::HTML - Highlight HTML syntax

VERSION

Top

Version 0.04

SYNOPSIS

Top

    use Syntax::Highlight::HTML;

    my $highlighter = new Syntax::Highlight::HTML;
    $output = $highlighter->parse($html);

If $html contains the following HTML fragment:

    <!-- a description list -->
    <dl compact="compact">
      <dt>some word</dt>
      <dd>the description of the word. Plus some <a href="/definitions/other_word"
      >reference</a> towards another definition. </dd>
    </dl>

then the resulting HTML contained in $output will render like this:

DESCRIPTION

Top

This module is designed to take raw HTML input and highlight it (using a CSS stylesheet, see "Notes" for the classes). The returned HTML code is ready for inclusion in a web page.

It is intented to be used as an highlighting filter, and as such does not reformat or reindent the original HTML code.

METHODS

Top

new()

The constructor. Returns a Syntax::Highlight::HTML object, which derives from HTML::Parser. As such, any HTML::parser method can be called on this object (that is, expect for parse() which is overloaded here).

Options

  • nnn - Activate line numbering. Default value: 0 (disabled).
  • pre - Surround result by <pre>...</pre> tags. Default value: 1 (enabled).

Example

To avoid surrounding the result by the <pre>...</pre> tags:

    my $highlighter = Syntax::Highlight::HTML->new(pre => 0);

parse()

Parse the HTML code given in argument and returns the highlighted HTML code, ready for inclusion in a web page.

Example

    $highlighter->parse("<p>Hello, world.</p>");

Internals Methods

The following methods are for internal use only.

_highlight_tag()

HTML::Parser tags handler: highlights a tag.

_highlight_text()

HTML::Parser text handler: highlights text.

NOTES

Top

The resulting HTML uses CSS to colourize the syntax. Here are the classes that you can define in your stylesheet.

An example stylesheet can be found in eg/html-syntax.css.

EXAMPLE

Top

Here is an example of generated HTML output. It was generated with the script eg/highlight.pl.

The following HTML fragment (which is the beginning of http://search.cpan.org/~saper/)

    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
    <html>
     <head>
      <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
      <link rel="stylesheet" href="/s/style.css" type="text/css">
      <title>search.cpan.org: S&#233;bastien Aperghis-Tramoni</title>
     </head>
     <body id="cpansearch">
    <center><div class="logo"><a href="/"><img src="/s/img/cpan_banner.png" alt="CPAN"></a></div></center>
    <div class="menubar">
     <a href="/">Home</a>
    &middot; <a href="/author/">Authors</a>
    &middot; <a href="/recent">Recent</a>
    &middot; <a href="/news">News</a>
    &middot; <a href="/mirror">Mirrors</a>
    &middot; <a href="/faq.html">FAQ</a>
    &middot; <a href="/feedback">Feedback</a>
    </div>
    <form method="get" action="/search" name="f" class="searchbox">
    <input type="text" name="query" value="" size="35">
    <br>in <select name="mode">
     <option value="all">All</option>
     <option value="module" >Modules</option>
     <option value="dist" >Distributions</option>
     <option value="author" >Authors</option>
    </select>&nbsp;<input type="submit" value="CPAN Search">
    </form>

will be rendered like this (using the CSS stylesheet eg/html-syntax.css):

CAVEATS

Top

Syntax::Highlight::HTML relies on HTML::Parser for parsing the HTML and therefore suffers from the same limitations.

SEE ALSO

Top

HTML::Parser

AUTHORS

Top

Sébastien Aperghis-Tramoni, <sebastien@aperghis.net>

BUGS

Top

Please report any bugs or feature requests to bug-syntax-highlight-html@rt.cpan.org, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=Syntax-Highlight-HTML. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE

Top


Syntax-Highlight-HTML documentation Contained in the Syntax-Highlight-HTML distribution.
package Syntax::Highlight::HTML;
use strict;
use HTML::Parser;

{ no strict;
  $VERSION = '0.04';
  @ISA = qw(HTML::Parser);
}

my %classes = (
    declaration   => 'h-decl',  # declaration <!DOCTYPE ...>
    process       => 'h-pi',    # process instruction <?xml ...?>
    comment       => 'h-com',   # comment <!-- ... -->
    angle_bracket => 'h-ab',    # the characters '<' and '>' as tag delimiters
    tag_name      => 'h-tag',   # the tag name of an element
    attr_name     => 'h-attr',  # the attribute name
    attr_value    => 'h-attv',  # the attribute value
    entity        => 'h-ent',   # any entities: &eacute; &#171;
    line_number   => 'h-lno',   # line number
);

my %defaults = (
    pre     => 1, # add <pre>...</pre> around the result? (default: yes)
    nnn     => 0, # add line numbers (default: no)
);

sub new {
    my $self = __PACKAGE__->SUPER::new(
        # API version
        api_version      => 3, 

        # Options
        case_sensitive   => 1, 
        attr_encoded     => 1, 

        # Handlers
        declaration_h    => [ \&_highlight_tag,  'self, event, tagname, attr, text' ], 
        process_h        => [ \&_highlight_tag,  'self, event, tagname, attr, text' ], 
        comment_h        => [ \&_highlight_tag,  'self, event, tagname, attr, text' ], 
        start_h          => [ \&_highlight_tag,  'self, event, tagname, attr, text' ], 
        end_h            => [ \&_highlight_tag,  'self, event, tagname, attr, text' ], 
        text_h           => [ \&_highlight_text, 'self, text' ], 
        default_h        => [ \&_highlight_text, 'self, text' ], 
    );
    
    my $class = ref $_[0] || $_[0]; shift;
    bless $self, $class;
    
    $self->{options} = { %defaults };
    
    my %args = @_;
    for my $arg (keys %defaults) {
        $self->{options}{$arg} = $args{$arg} if defined $args{$arg}
    }
    
    $self->{output} = '';
    
    return $self
}

sub parse {
    my $self = shift;
    
    ## parse the HTML fragment
    $self->{output} = '';
    $self->SUPER::parse($_[0]);
    $self->eof;
    
    ## add line numbering?
    if($self->{options}{nnn}) {
        my $i = 1;
        $self->{output} =~ s|^|<span class="$classes{line_number}">@{[sprintf '%3d', $i++]}</span> |gm;
    }
    
    ## add <pre>...</pre>?
    $self->{output} = "<pre>\n" . $self->{output} . "</pre>\n" if $self->{options}{pre};
    
    return $self->{output}
}

sub _highlight_tag {
    my $self = shift;
    my $event = shift;
    my $tagname = shift;
    my $attr = shift;
    
    $_[0] =~ s|&([^;]+;)|<span class="$classes{entity}">&amp;$1</span>|g;
    
    if($event eq 'declaration' or $event eq 'process' or $event eq 'comment') {
        $_[0] =~ s/</&lt;/g;
        $_[0] =~ s/>/&gt;/g;
        $self->{output} .= qq|<span class="$classes{$event}">| . $_[0] . '</span>'
    
    } else {
        $_[0] =~ s|^<$tagname|<<span class="$classes{tag_name}">$tagname</span>|;
        $_[0] =~ s|^</$tagname|</<span class="$classes{tag_name}">$tagname</span>|;
        $_[0] =~ s|^<(/?)|<span class="$classes{angle_bracket}">&lt;$1</span>|;
        $_[0] =~ s|(/?)>$|<span class="$classes{angle_bracket}">$1&gt;</span>|;
        
        for my $attr_name (keys %$attr) {
            next if $attr_name eq '/';
            $_[0] =~ s{$attr_name=(["'])\Q$$attr{$attr_name}\E\1}
                        {<span class="$classes{attr_name}">$attr_name</span>=<span class="$classes{attr_value}">$1$$attr{$attr_name}</span>$1}
        }
        
        $self->{output} .= $_[0];
    }
}

sub _highlight_text {
    my $self = shift;
    $_[0] =~ s|&([^;]+;)|<span class="$classes{entity}">&amp;$1</span>|g;
    $self->{output} .= $_[0];
}

1; # End of Syntax::Highlight::HTML