Unicode::Digits - Convert UNICODE digits to integers you can do math with


Unicode-Digits documentation Contained in the Unicode-Digits distribution.

Index


Code Index:

NAME

Top

Unicode::Digits - Convert UNICODE digits to integers you can do math with

VERSION

Top

Version 20090607

SYNOPSIS

Top

So, you have matched a string with \d and now want to do some math. What is that you say? The number your captured plus 5 is 5? Oh, that is right \d now matches UNICODE digits not [0-9]. What to do? Well, You can just call digits_to_int and all of your troubles* are over!

    use Unicode::Digits qw/digits_to_int/;

    my $string = "forty-two in Mongolian is \x{1814}\x{1812}";
    my $num = digits_to_int $string =~ /(\d+)/;
    print $num + 5, "\n";

FUNCTIONS

Top

digits_to_int(STRING)

The digits_to_int function transliterates a string of UNICODE digit characters to a number you can do math with, non-digit characters are passed through, so "42 is \x{1814}\x{1812}" becomes "42 is 42".

digits_to_int(STRING, ERRORHANDLING)

You can optionally pass an argument that controls what happens when the source string contains non-digit characters or characters from different sets of digits. ERRORHANDLING can be one of "strict", "loose", "looser", or "loosest". Their behaviours are as follows:

strict

All of the characters must be digit characters and they must all come from the same range (so no mixing Monglian digits with Arabic-Indic digits) or the function will die.

loose

All of the characters must be digit characters or it will die. If there are characters from different ranges you will get a warning.

looser

If there are any non digit characters, or the characters are from different ranges, you will get a warning.

loosest

This is the default mode, all non-digit characters are passed through witout warning, and the digits do not have to come from the same range.

AUTHOR

Top

Chas. J. Owens IV, <chas.owens at gmail.com>

DIAGNOSTICS

Top

"wrong number of arguments"

digits_to_int takes one or two arguments, if you have more than two or no arguments you will recieve this error.

"ERRORHANDLING must be strict, loose, looser, or loosest not '%s'"

If you pass a second argument that is not strict, loose, looser, or loosest to digits_to_int, you will recieve this error.

"string '%s' contains non-digit characters"

You will recieve this message as a warning or error (depending on what mode you chose), if the string has characters that do not have the UNICODE digit property.

"string '$s' contains digits from different ranges"

You will recieve this message as a warning or error (depending on what mode you chose), if the string has characters that are not part of the same range of digit characters.

"U+%x claims to be a digit, but doesn't have a digit number"

This error is unlikely to occur, if it does then the bug is either with my code (the likely scenario) or Unicode::UCD (not very likely).

BUGS

Top

My understanding of UNICODE is flawed, therefore, I have undoubtly done something wrong. For instance, what should be done with "5\x{0308}"? Also, there is a bunch of stuff relating to surrogates I don't understand.

SUPPORT

Top

You can find documentation for this module with the perldoc command.

    perldoc Unicode::Digits

COPYRIGHT & LICENSE

Top


Unicode-Digits documentation Contained in the Unicode-Digits distribution.
package Unicode::Digits;

use warnings;
use strict;

use Carp;
use Unicode::UCD qw/charinfo/;
use Exporter     qw/import/;

our @EXPORT_OK = qw(digits_to_int);

our $VERSION = '20090607';

sub _find_zero($) {
	my $ord = ord shift;
	return $ord - charinfo($ord)->{digit};
}

sub digits_to_int {
	croak "wrong number of arguments" unless @_ == 1 or @_ == 2;
	my ($string, $mode) = @_;
	$mode = "loosest" unless defined $mode;

	croak "ERRORHANDLING must be strict, loose, looser, or loosest not '$mode'"
		unless $mode =~ /^(?:strict|loose(?:r|st)?)$/;

	croak "string '$string' contains non-digit characters"
		if $mode =~ '^(?:strict|loose)$' and $string =~ /\D/;

	carp "string '$string' contains non-digit characters"
		if $mode eq "looser" and $string =~ /\D/;

	my $num;
	my ($first_num) = $string =~ /(\d)/;
	return $string unless defined $first_num;

	my $zero = _find_zero $first_num;

	for my $d (split //, $string) {
		if ($d =~ /\D/) {
			$num .= $d;
			next;
		}

		my $info = charinfo ord $d;

		croak "string '$string' contains digits from different ranges"
			if $mode eq 'strict' and $zero != _find_zero $d;

		carp "string '$string' contains digits from different ranges"
			if $mode =~ /^looser?$/ and $zero != _find_zero $d;

		die sprintf "U+%x claims to be a digit, but doesn't have a digit number", ord $d
			unless $info->{digit} =~ /[0-9]/;

		$num .= $info->{digit};
	}
	return $num;
}

"this is not an interesting return value";