HTML::Clean - Cleans up HTML code for web browsers, not humans


HTML-Clean documentation  | view source Contained in the HTML-Clean distribution.

Index


NAME

Top

HTML::Clean - Cleans up HTML code for web browsers, not humans

SYNOPSIS

Top

  use HTML::Clean;
  $h = new HTML::Clean($filename); # or..
  $h = new HTML::Clean($htmlcode);

  $h->compat();
  $h->strip();
  $data = $h->data();
  print $$data;

DESCRIPTION

Top

The HTML::Clean module encapsulates a number of common techniques for minimizing the size of HTML files. You can typically save between 10% and 50% of the size of a HTML file using these methods. It provides the following features:

Remove unneeded whitespace (begining of line, etc)
Remove unneeded META elements.
Remove HTML comments (except for styles, javascript and SSI)
Replace tags with equivilant shorter tags (<strong> --> <b>)
etc.

The entire proces is configurable, so you can pick and choose what you want to clean.

THE HTML::Clean CLASS

Top

$h = new HTML::Clean($dataorfile, [$level]);

This creates a new HTML::Clean object. A Prerequisite for all other functions in this module.

The $dataorfile parameter supplies the input HTML, either a filename, or a reference to a scalar value holding the HTML, for example:

  $h = new HTML::Clean("/htdocs/index.html");
  $html = "<strong>Hello!</strong>";
  $h = new HTML::Clean(\$html);

An optional 'level' parameter controls the level of optimization performed. Levels range from 1 to 9. Level 1 includes only simple fast optimizations. Level 9 includes all optimizations.

$h->initialize($dataorfile)

This function allows you to reinitialize the HTML data used by the current object. This is useful if you are processing many files.

$dataorfile has the same usage as the new method.

Return 0 for an error, 1 for success.

$h->level([$level])

Get/set the optimization level. $level is a number from 1 to 9.

$myref = $h->data()

Returns the current HTML data as a scalar reference.

strip(\%options);

Removes excess space from HTML

You can control the optimizations used by specifying them in the %options hash reference.

The following options are recognized:

boolean values (0 or 1 values)
  whitespace    Remove excess whitespace
  shortertags   <strong> -> <b>, etc..
  blink         No blink tags.
  contenttype   Remove default contenttype.
  comments      Remove excess comments.
  entities      &quot; -> ", etc.
  dequote       remove quotes from tag parameters where possible.
  defcolor      recode colors in shorter form. (#ffffff -> white, etc.)
  javascript    remove excess spaces and newlines in javascript code.
  htmldefaults  remove default values for some html tags
  lowercasetags translate all HTML tags to lowercase

parameterized values
  meta        Takes a space separated list of meta tags to remove, 
              default "GENERATOR FORMATTER"

  emptytags   Takes a space separated list of tags to remove when there is no
              content between the start and end tag, like this: <b></b>. 
              The default is 'b i font center'

compat()

This function improves the cross-platform compatibility of your HTML. Currently checks for the following problems:

Insuring all IMG tags have ALT elements.
Use of Arial, Futura, or Verdana as a font face.
Positioning the <TITLE> tag immediately after the <head> tag.

defrontpage();

This function converts pages created with Microsoft Frontpage to something a Unix server will understand a bit better. This function currently does the following:

Converts Frontpage 'hit counters' into a unix specific format.
Removes some frontpage specific html comments

SEE ALSO

Top

Modules

FrontPage::Web, FrontPage::File

Web Sites

Distribution Site - http://people.itu.int/~lindner/

AUTHORS

Top

Paul Lindner for the International Telecommunication Union (ITU)

COPYRIGHT

Top


HTML-Clean documentation  | view source Contained in the HTML-Clean distribution.