HTML::ListScraper::Interactive - formatting data from HTML::ListScraper
Formats a tag sequence to emphasize its tree-like structure. Takes 2
or 3 parameters: a HTML::ListScraper object, array reference
containing HTML::ListScraper::Tag objects and an optional hash with
formatting options. format_tags returns an array (array reference
if called in a scalar context) with formatted tag names and text.
The formatting options are
- attr
-
Include the href attribute in the output.
- text
-
Include the plain text in the output.
- index
-
Include tag positions in the output.
The returned values are basically XHTML lines: opening tags, text with
quoted entities and closing tags. Tags are enclosed in angle
brackets. The returned values don't necessarily form a valid XML
fragment, though, i.e. because the input tags need not form a
tree.
When index is set, tag values start with the tag's index, followed
by a tab. Next, spaces show indentation. An opening tag not identified
as missing a closing tag increases indentation by 2 spaces, a closing
tag decreases it back. An opening tag with missing closing tag is
output with '/' appended to its name. For the rules of associating
opening and closing tags, see HTML::ListScraper::shapeless.
When attr is set, links are formatted without whitespace and
enclosed in double quotes. Double quotes in links are escaped, but no
other characters are (which can also make the result invalid
HTML). When text is set, the output text has normalized whitespace;
nodes containing only whitespace are dropped. Gaps between adjacent
tag positions are displayed as an empty line. All values end with a
newline.
Undoes the formatting done by format_tags. Takes a list of lines
such as those output by format_tags when called without any
formatting options and converts them to a list of tag names. Note that
canonicalize_tags doesn't handle attributes, text lines nor index
numbers.