Text::Statistics::Latin - Performs statistical analysis of corpora


Text-Statistics-Latin documentation  | view source Contained in the Text-Statistics-Latin distribution.

Index


NAME

Top

Text::Statistics::Latin - Performs statistical analysis of corpora

VERSION

Top

Version 0.06

SYNOPSIS use CText::CStatiBR; &Text::CStatiBR::CSTATIBR();

Top

DESCRIPTION

Given a copus as input, Text::Statistics::Latin creates a seven column CSV file as output, with one line for each token per text. Names of input files need match the following pattern:

    1 (1). txt', '1 (2). txt', ..., '1 (n).txt'

or

    1 \(([1-9]|[1-9][0-9]+)\)\.txt

Columns store statistical information:

    (1) number of word forms in document d;  
    (2) number of tokens in d;  
    (3) Id number of d, ie., n;  
    (4) frequency of term t in d;  
    (5) corpus frequency of t ;  
    (6) document frequency of t (number of documents where t occurs at
+ least once);  
    (7) t, UTF8 latin coded token-string delimited by C<< /[ -@]|[\[-`
+]|[{-¿]|[&#592;-&#745;]|[&#884;-&#65533;]/ >>

    Main output file name is '1 (n + 5).txt' and it is stored in the s
+ame directory as
    the corpus, together with residual files on each input file with .
+txu and .txv ad hoc extensions.  

    This code was written under CAPES BEX-09323-5

Example:

    #!/usr/bin/perl  
    use strict;  
    use Text::CStatiBR;  

    &Text::CStatiBR::CSTATIBR("5");     #5 files are analised.  
                                        #Main output
                                        #file created is  
                                        #1 (10).txt

EXPORT

Top

    &LATIN();

BUGS

Top

Please report any bugs or feature requests to bug-text-statistics-latin at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Statistics-Latin. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

Top

You can find documentation for this module with the perldoc command.

    perldoc Text::Statistics::Latin

You can also look for information at:

* AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/Text-Statistics-Latin

* CPAN Ratings

http://cpanratings.perl.org/d/Text-Statistics-Latin

* RT: CPAN's request tracker

http://rt.cpan.org/NoAuth/Bugs.html?Dist=Text-Statistics-Latin

* Search CPAN

http://search.cpan.org/dist/Text-Statistics-Latin

COPYRIGHT & LICENSE

Top


Text-Statistics-Latin documentation  | view source Contained in the Text-Statistics-Latin distribution.