Lingua::ZH::Wrap - Wrap Chinese text


Lingua-ZH-Wrap documentation Contained in the Lingua-ZH-Wrap distribution.

Index


Code Index:

NAME

Top

Lingua::ZH::Wrap - Wrap Chinese text

VERSION

Top

This document describes version 0.03 of Lingua::ZH::Wrap, released July 25, 2004.

SYNOPSIS

Top

Example 1

    use Lingua::ZH::Wrap;

    $initial_tab = "\t";	# Tab before first line
    $subsequent_tab = "";	# All other lines flush left

    print wrap( $initial_tab, $subsequent_tab, @lines );

Example 2

    use Lingua::ZH::Wrap qw(wrap $columns $overflow);

    $columns  = 75;		# Change columns
    $overflow = 1;		# Chinese char may occupy 76th col

    print wrap( '', '', @lines );

DESCRIPTION

Top

Lingua::ZH::Wrap::wrap() is a very simple paragraph formatter. It formats a single paragraph at a time by breaking lines at Chinese character boundries.

Indentation is controlled for the first line ($initial_tab) and all subsequent lines ($subsequent_tab) independently. Please note: $initial_tab and $subsequent_tab are the literal strings that will be used: it is unlikely you would want to pass in a number.

OVERRIDES

Top

Lingua::ZH::Wrap::wrap() has a number of variables that control its behavior.

Lines are wrapped at $Lingua::ZH::Wrap::columns columns; if a Chinese character just extends columns by one byte, it will be wrapped into the next line, unless $Lingua::ZH::Wrap::overflow is set to a true value.

CAVEATS

Top

The algorithm doesn't care about breaking non-Chinese words. Also, if you pass in strings encoded unicode, it will currently first decode into Big5, do the conversion, then convert back.

Patches are, of course, very welcome; in particular, I'd like to use Lingua::ZH::TaBE to avoid beginning-of-line punctuations, as well as employing other semantic-sensitive formatting techniques.

SEE ALSO

Top

Text::Wrap

AUTHORS

Top

Autrijus Tang <autrijus@autrijus.org>

COPYRIGHT

Top


Lingua-ZH-Wrap documentation Contained in the Lingua-ZH-Wrap distribution.
# $File: //member/autrijus/Lingua-ZH-Wrap/Wrap.pm $ $Author: autrijus $
# $Revision: #3 $ $Change: 4527 $ $DateTime: 2003/03/03 03:17:20 $

package Lingua::ZH::Wrap;
$Lingua::ZH::Wrap::VERSION = '0.03';

use strict;
use vars qw($VERSION @ISA @EXPORT $columns $overflow);

use Exporter;

@ISA      = qw(Exporter);
@EXPORT   = qw(wrap $overflow $columns);
$columns  = 72;
$overflow = 0;

require Encode if $] >= 5.008;

sub wrap {
    if ($] >= 5.008 and Encode::is_utf8($_[2])) {
	return Encode::decode(big5 => _wrap(map {
	    Encode::is_utf8($_) ? Encode::encode(big5 => $_) : $_
	} @_));
    }

    return _wrap(@_);
}

sub _wrap {
    my ($init, $subs) = (shift, shift);

    return join("\n", map(_wrap($init, $subs, $_), @_)) if @_ > 1;

    my $str = shift;
    return join("\n", map(_wrap($init, $subs, $_), split("\n", $str)))
        if (index($str, "\n") > -1); # Handles single-line only

    my ($fin, $pos) = ($columns - 1, 0);

    $str = "$init$str";

    do {
        $str =~ m/\G.{0,$fin}[\x00-\x7f]/go;
        $pos += $columns + (((pos($str) ||= $pos) + $pos + $columns) % 2)
            * (!!$overflow * 2 - 1);
        return $str if $pos >= length($str);
        substr($str, $pos, 0, "\n$subs");
    } while (pos($str) = ++$pos);

    return $str;
}

1;