| Text-Tokenizer documentation | Contained in the Text-Tokenizer distribution. |
Text::Tokenizer - Perl extension for tokenizing text(config) files
use Text::Tokenizer ':all';
#open file and set add it to tokenizer inputs
open(F_CONFIG, "input.conf") || die("failed to open input.conf");
$tok_id = tokenizer_new(F_CONFIG);
tokenizer_options(TOK_OPT_NOUNESCAPE|TOK_OPT_PASSCOMMENT);
while(1)
{
($string, $tok_type, $line, $err, $errline) = tokenizer_scan();
last if($tok_type == TOK_ERROR || $tok_type == TOK_EOF);
if($tok_type == TOK_TEXT) { }
elsif($tok_type == TOK_BLANK) { }
elsif($tok_type == TOK_DQUOTE) { $string = "\"$str\""; }
elsif($tok_type == TOK_SQUOTE) { $string = "\'$str\'"; }
elsif($tok_type == TOK_SIQUOTE) { $string = "\`$str\'"; }
elsif($tok_type == TOK_IQUOTE) { $string = "\`$str\`"; }
elsif($tok_type == TOK_EOL) { $string = "\n"; }
elsif($tok_type == TOK_COMMENT) { }
elsif($tok_type == TOK_UNDEF)
{ last; }
else { last; };
print $string;
}
tokenizer_delete($tok_id);
Very complex example of using Text::Tokenizer can be found in passwd_exp - tool for password
expiration notification (http://freshmeat.net/projects/passwd_exp)
Text::Tokenizer is very fast lexical analyzer, that can be used to process input text from file or buffer to basic tokens:
None by default. You have to selectively import methods or constants or use ':all' to import all constants & methods.
Undefined token (tokenizer error)
Normal_text
"Double quoted text"
'Single quoted text'
`Inverse quoted text`
`Single-inverse quoted text'
Whitespace text
#Comment
End of Line
End of File
Error Condition (see ERROR_TYPES)
No error
Unclosed double quote found
Unclosed single quote found
Unclosed inverse quote found
Failed to allocate tokenizer context (FATAL ERROR)
Default options set, equals to TOK_OPT_NOUNESCAPE
Set no options. Tokenizer will do in it's default behaviour - it will not unescape anything and it will not pass comments to you.
Disable characters & lines unescaping.
Enable looking for `single-inverse quote' combination.
Unescape chars & lines.
Unescape chars (inside of quotes only)
Unescape lines (inside of quotes only)
Enable comment passing to user routines.
Unescape lines (outside of quotes). Escaped end of line will not terminate value processing processing. So escaped multiline text will be returned as single line string.
Set tokenizer options.
Create new tokenizer instance(context) from FILE_HANDLE identified by $tok_id.
Create new tokenizer instance from string BUFFER long LENGTH characters. Return its tokenizer instance id.
Scan current tokenizer instance, and return first token found. @tok = ($string, $type, $line, $error, $error_line)
Test if tokenizer instance exists.
Switch to another tokenizer instance (like when you perform include statment).
Delete tokenizer instance (You have to do it exactly on EOF to release connection between file or buffer.
Flush tokenizer instance. This function discards the instance buffer's contents, so the next time the scanner attempts to match a token from the buffer, it will have to fill it.
This tokenizer is based on code generated by flex - fast lexical analyzer generator (http://lex.sourceforge.net).
Samuel Behan, <_samkob_(a)_gmail_._com_>
Copyright 2003-2006 by Samuel Behan
This library is free software; you can redistribute it and/or modify it under the same terms of GNU/GPL v2.
| Text-Tokenizer documentation | Contained in the Text-Tokenizer distribution. |
package Text::Tokenizer; use strict; use warnings; use Carp; require Exporter; use AutoLoader; our @ISA = qw(Exporter); # Items to export into callers namespace by default. Note: do not export # names by default without a very good reason. Use EXPORT_OK instead. # Do not simply export all your public functions/methods/constants. # This allows declaration use Tokenizer ':all'; # If you do not need this, moving things directly into @EXPORT or @EXPORT_OK # will save memory. our %EXPORT_TAGS = ( 'all' => [ qw( TOK_UNDEF TOK_TEXT TOK_DQUOTE TOK_SQUOTE TOK_IQUOTE TOK_SIQUOTE TOK_BLANK TOK_ERROR TOK_EOL TOK_COMMENT TOK_EOF TOK_BASH_COMMENT TOK_C_COMMENT TOK_CC_COMMENT NOERR UNCLOSED_DQUOTE UNCLOSED_SQUOTE UNCLOSED_IQUOTE NOCONTEXT UNCLOSED_C_COMMENT TOK_OPT_DEFAULT TOK_OPT_NONE TOK_OPT_NOUNESCAPE TOK_OPT_SIQUOTE TOK_OPT_UNESCAPE TOK_OPT_UNESCAPE_CHARS TOK_OPT_UNESCAPE_LINES TOK_OPT_PASSCOMMENT TOK_OPT_PASS_COMMENT TOK_OPT_UNESCAPE_NQ_LINES TOK_OPT_C_COMMENT TOK_OPT_CC_COMMENT TOK_OPT_NO_BASH_COMMENT TOK_OPT_NO_IQUOTE tokenizer_options tokenizer_new tokenizer_new_strbuf tokenizer_scan tokenizer_exists tokenizer_switch tokenizer_delete tokenizer_flush tokenizer_destroy ) ] ); our @EXPORT_OK = ( @{ $EXPORT_TAGS{'all'} } ); our @EXPORT = qw( TOK_UNDEF TOK_TEXT TOK_DQUOTE TOK_SQUOTE TOK_IQUOTE TOK_SIQUOTE TOK_BLANK TOK_ERROR TOK_EOL TOK_COMMENT TOK_EOF TOK_BASH_COMMENT TOK_C_COMMENT TOK_CC_COMMENT NOERR UNCLOSED_DQUOTE UNCLOSED_SQUOTE UNCLOSED_IQUOTE NOCONTEXT UNCLOSED_C_COMMENT TOK_OPT_DEFAULT TOK_OPT_NONE TOK_OPT_NOUNESCAPE TOK_OPT_SIQUOTE TOK_OPT_UNESCAPE TOK_OPT_UNESCAPE_CHARS TOK_OPT_UNESCAPE_LINES TOK_OPT_PASSCOMMENT TOK_OPT_PASS_COMMENT TOK_OPT_UNESCAPE_NQ_LINES TOK_OPT_C_COMMENT TOK_OPT_CC_COMMENT TOK_OPT_NO_BASH_COMMENT TOK_OPT_NO_IQUOTE tokenizer_options tokenizer_new tokenizer_new_strbuf tokenizer_scan tokenizer_exists tokenizer_switch tokenizer_delete tokenizer_flush tokenizer_destroy ); our $VERSION = '0.4.2'; sub AUTOLOAD { # This AUTOLOAD is used to 'autoload' constants from the constant() # XS function. my $constname; our $AUTOLOAD; ($constname = $AUTOLOAD) =~ s/.*:://; croak "&Tokenizer::constant not defined" if $constname eq 'constant'; my ($error, $val) = constant($constname); if ($error) { croak $error; } { no strict 'refs'; # Fixed between 5.005_53 and 5.005_61 #XXX if ($] >= 5.00561) { #XXX *$AUTOLOAD = sub () { $val }; #XXX } #XXX else { *$AUTOLOAD = sub { $val }; #XXX } } goto &$AUTOLOAD; } require XSLoader; XSLoader::load('Text::Tokenizer', $VERSION); # Preloaded methods go here. # Autoload methods go after =cut, and are processed by the autosplit program. 1; __END__ # Below is stub documentation for your module. You'd better edit it!