Solstice::StringLibrary - A library of generic string manipulation functions


Solstice documentation  | view source Contained in the Solstice distribution.

Index


NAME

Top

Solstice::StringLibrary - A library of generic string manipulation functions

SYNOPSIS

Top

  use StringLibrary qw(truncstr);

  my $str = truncstr("This is a line of text that needs truncating.");

DESCRIPTION

Top

Functions in this library make no assumptions about the content of the string being modified.

Superclass

Exporter

Export

No symbols exported.

Functions

htmltounicode($string)

Returns $string with all ê-like unicode entities packed into perl unicode.

scrubhtml ($string)

Returns $string with all malicious scripts, broken tags, relative links, dynamic css, etc removed. =cut

sub scrubhtml { my ($string) = @_; return undef unless defined $string;

    my $parser = Solstice::StripScripts::Parser->new({
        AllowSrc     => 1,
        AllowHref    => 1,
        AllowNonHTTP => 1,
    });
    $parser->parse($string);
    $parser->eof;
    return $parser->filtered_document;
}

truncstr($string, $cutoff, $marker)

Returns $string truncated to $cutoff, and appended with an optional cutoff marker (defaults to '...').

truncemail($string, $left_limit, $right_limit, $marker)

Returns $string truncated to $left_limit characters to the left of the first @ sign, $right_limit characters to the right of the last @ sign. It will use $marker as the replacement. Defaults are 20, 30 and '...'.

fixstrlen($string, $cutoff, $marker)

Returns a string of fixed-length. Strings shorter than $cutoff are ignored. Strings longer than $cutoff are transformed as in the following example: Before: This is a long string of text that needs shortening After: This is a long string o...ning

fixlinewidth

Returns a string with breaking spaces inserted.

encode($string, $unsafe_chars)

Returns $string with HTML entities encoded. The string $unsafe_chars specifies which characters to consider unsafe (i.e., which to escape). The default set of characters to encode are control chars, high-bit chars, and the <, &, >, ' and " characters. This function just wraps HTML::Entities::encode_entities.

decode($string)

Returns $string with HTML entities decoded. This function just wraps HTML::Entities::decode.

unrender($string, $convert_whitespace)

Returns $string transformed into a non-HTML-renderable string, by converting '&<"' chars to entities. Numeric entities are ignored. If $convert_whitespace is passed and is true, whitespace chars ' ', \t and \n are converted to HTML approximations.

urlclean

Removes double slashes in urls

htmltotext($string)

$string should contain html. Returns $string with html removed, and replaced with whitespace formatting.

        <ul>
eg:     <li>a   becomes:    * a
        <li>b               * b
        </ul>
=cut

sub htmltotext { my $string = shift; return undef unless defined $string;

    #oh lord, this string replacement thing is so nasty, but
    #one of these html libraries was mangling entities.
    $string =~ s/\&([^;]+)?;/SOLSTICE__REPLACE__TOKEN$1;/g;

    my $tree = HTML::TreeBuilder->new_from_content($string);
    my $formatter = new Solstice::StringLibrary::FormatText(leftmargin => 0, rightmargin => 55);
    $string = $formatter->format($tree);
    $tree->delete();

    $string =~ s/SOLSTICE__REPLACE__TOKEN/\&/g;
    $string =~ s/&nbsp;/ /g;
    return $string;
}

extracttext($string)

$string should contain html. Returns $string with html removed.

convertspaces($string)

Returns $string transformed into a non-breaking HTML line by replacing ' ' with '&nbsp;'.

strtoascii($string)

Changes certain characters (curly quotes, emdash, endash) to their ASCII equivalent.

\x91 curly single quote left \x92 curly single quote right \x93 curly double quote left \x94 curly double quote right \x95 bullet point \x96 emdash \x97 endash \xa9 copyright \x85 elipses • bullet point

strtourl($string)

Returns $string transformed into a safe url, by url-encoding non-word characters.

strtofilename($string, $preserve_whitespace)

Returns $string transformed into a safe file name, by converting spaces to underscores and removing forward slashes. $preserve_whitespace specifies that whitespace should be escaped rather than translated.

strtojavascript($string)

Returns $string transformed into a javascript-safe string, by escaping single- and double-quote characters.

trimstr($string)

Remove leading and trailing whitespace from $string.

scrubcdata($string)

This will return a string with ]]> escaped, so it will be cdata safe.

Modules Used

Exporter, HTML::Entities, HTML::TreeBuilder, HTML::FormatText, Solstice::StripScripts::Parser.

AUTHOR

Top

Catalyst Group, <catalyst@u.washington.edu>

VERSION

Top

$Revision: 2418 $

COPYRIGHT

Top


Solstice documentation  | view source Contained in the Solstice distribution.