Text::Widont - Suppress typographic widows


Text-Widont documentation Contained in the Text-Widont distribution.

Index


Code Index:

NAME

Top

Text::Widont - Suppress typographic widows

SYNOPSIS

Top

    use Text::Widont;

    # For a single string...
    my $string = 'Look behind you, a Three-Headed Monkey!';
    print widont($string, nbsp->{html});  # "...a Three-Headed Monkey!"

    # For a number of strings...
    my $strings = [
        'You fight like a dairy farmer.',
        'How appropriate. You fight like a cow.',
    ];
    print join "\n", @{ widont( $strings, nbsp->{html} ) };

Or the object oriented way:

    use Text::Widont qw( nbsp );

    my $tw = Text::Widont->new( nbsp => nbsp->{html} );

    my $string = "I'm selling these fine leather jackets.";
    print $tw->widont($string);  # "...fine leather jackets."




DESCRIPTION

Top

Collins English Dictionary defines a "widow" in typesetting as:

    A short line at the end of a paragraph, especially one that occurs as the
    top line of a page or column.

For example, in the text...

    How much wood could a woodchuck
    chuck if a woodchuck could chuck
    wood?

...the word "wood" at the end is considered a widow. Using Text::Widont, that sentence would instead appear as...

    How much wood could a woodchuck
    chuck if a woodchuck could
    chuck wood?




NON-BREAKING SPACE TYPES

Top

Text::Widont exports a hash ref, nbsp, that contains the following representations of a non-breaking space to be used with the widont function:

html

The   HTML character entity.

html_hex

The   HTML character entity.

html_dec

The   HTML character entity.

unicode

Unicode's "No-Break Space" character.

FUNCTIONS

Top

widont( $string, $nbsp )

The widont function takes a string and returns a copy with the space between the final two words replaced with the given $nbsp. $string can optionally be a reference to an array of strings to transform. In this case strings will be modified in place as well as a copy returned.

In the absence of an explicit $nbsp, Unicode's No-Break Space character will be used.

METHODS

Top

Text::Widont also provides an object oriented interface.

->new( nbsp => $nbsp )

Instantiates a new Text::Widont object. nbsp is an optional argument that will be used when performing the substitution. It defaults to Unicode's No-Break Space character.

->widont( $string )

Performs the substitution described above, using the object's nbsp property and the given string.

DEPENDENCIES

Top

Text::Widont requires the following modules:

BUGS

Top

Please report any bugs or feature requests to bug-text-widont at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Widont.

SUPPORT

Top

You can find documentation for this module with the perldoc command.

    perldoc Text::Widont

You may also look for information at:

* Text::Widont

http://perlprogrammer.co.uk/modules/Text::Widont/

* AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/Text-Widont/

* RT: CPAN's request tracker

http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Widont

* Search CPAN

http://search.cpan.org/dist/Text-Widont/

AUTHOR

Top

Dave Cardwell <dcardwell@cpan.org>

ACKNOWLEDGEMENTS

Top

I was first introduced to the concept of typesetting widows and how they might be solved programatically by Shaun Inman.

http://www.shauninman.com/archive/2006/08/22/widont_wordpress_plugin

COPYRIGHT AND LICENSE

Top


Text-Widont documentation Contained in the Text-Widont distribution.
package Text::Widont;

use strict;
use warnings;

use Carp qw( croak );

our $VERSION = '0.01';


# By default export the 'widont' function and 'nbsp' constant.
use base qw/ Exporter /;
our @EXPORT = qw( widont nbsp );



use constant nbsp => {
    html     => '&nbsp;',
    html_dec => '&#160;',
    html_hex => '&#xA0;',
    unicode  => pack( 'U', 0x00A0 ),
};



# This function also acts as an object method as described in the next POD
# section.
sub widont {
    my ( $self, $string, $nbsp );
    $string = shift;
    
    # Check to see if the subroutine has been called as an object method...
    if ( ref $string eq 'Text::Widont' ) {
        $self   = $string;
        $string = shift;
        
        $nbsp = $self->{nbsp} eq 'html'     ? nbsp->{html}
              : $self->{nbsp} eq 'html_dec' ? nbsp->{html_dec}
              : $self->{nbsp} eq 'html_hex' ? nbsp->{html_hex}
              : $self->{nbsp} eq 'unicode'  ? nbsp->{unicode}
              : $self->{nbsp};
    }
    
    # Make sure a $string was passed...
    croak 'widont requires a string' if !defined $string;
    
    # $nbsp defaults to unicode...
    $nbsp ||= shift || nbsp->{unicode};
    
    
    # Iterate over the string(s) to perform the transformation...
    foreach ( ref $string eq 'ARRAY' ? @$string : $string ) {
        s/([^\s])\s+([^\s]+\s*)$/$1$nbsp$2/;
    }
    
    return $string;
}



sub new {
    my $class = shift;
    my $self  = bless { @_ }, $class;
    
    # Default to No-Break Space.
    $self->{nbsp} ||= nbsp->{unicode};
    
    return $self;
}



# sub widont {} already defined above.



1;  # End of the module code; everything from here is documentation...
__END__