Lingua::DE::Tagger

This module uses part-of-speech statistics from the Penn Treebank to assign POS tags to German text. The tagger applies a bigram (two-word) Hidden Markov Model to guess the appropriate POS tag for a word. That means that the tagger will try to assign a POS tag based on the known POS tags for a given word and the POS tag assigned to its predecessor.

The tagger tends to assume unknown words are nouns, but this behavior is configurable.

The POS tagger can also be used to find maximal noun phrases in tagged text. You can also use this module to extract all nouns and/or noun phrases.

TAG SET


The set of POS tags used here is a modified version of the Penn Treebank tagset. Tags with non-letter characters have been redefined to work better in our data structures. Also, the ``Determiner'' tag (DET) has been changed from `DT', in order to avoid confusion with the HTML tag, <DT>.
CC      Conjunction, coordinating               and, or
CD      Adjective, cardinal number              3, fifteen
DET     Determiner                              this, each, some
EX      Pronoun, existential there              there
FW      Foreign words           
IN      Preposition / Conjunction               for, of, although, that
JJ      Adjective                               happy, bad
JJR     Adjective, comparative                  happier, worse
JJS     Adjective, superlative                  happiest, worst
LS      Symbol, list item                       A, A.
MD      Verb, modal                             can, could, 'll
NN      Noun                                    aircraft, data
NNP     Noun, proper                            London, Michael
NNPS    Noun, proper, plural                    Australians, Methodists
NNS     Noun, plural                            women, books
PDT     Determiner, prequalifier                quite, all, half
POS     Possessive                              's, '
PRP     Determiner, possessive second           mine, yours
PRPS    Determiner, possessive                  their, your
RB      Adverb                                  often, not, very, here
RBR     Adverb, comparative                     faster
RBS     Adverb, superlative                     fastest
RP      Adverb, particle                        up, off, out
SYM     Symbol                                  *
TO      Preposition                             to
UH      Interjection                            oh, yes, mmm
VB      Verb, infinitive                        take, live
VBD     Verb, past tense                        took, lived
VBG     Verb, gerund                            taking, living
VBN     Verb, past/passive participle           taken, lived
VBP     Verb, base present form                 take, live
VBZ     Verb, present 3SG -s form               takes, lives
WDT     Determiner, question                    which, whatever
WP      Pronoun, question                       who, whoever
WPS     Determiner, possessive & question       whose
WRB     Adverb, question                        when, how, however

PP      Punctuation, sentence ender             ., !, ?
PPC     Punctuation, comma                      ,
PPD     Punctuation, dollar sign                $
PPL     Punctuation, quotation mark left        ``
PPR     Punctuation, quotation mark right       ''
PPS     Punctuation, colon, semicolon, elipsis  :, ..., -
LRB     Punctuation, left bracket               (, {, [
RRB     Punctuation, right bracket              ), }, ]

INSTALLATION

To install this module, run the following commands:

        perl Makefile.PL
        make
        make test
        make install

SUPPORT AND DOCUMENTATION

After installing, you can find documentation for this module with the perldoc command.

perldoc Lingua::DE::Tagger

You can also look for information at:

RT, CPAN's request tracker

http://rt.cpan.org/NoAuth/Bugs.html?Dist=Lingua-DE-Tagger

AnnoCPAN, Annotated CPAN documentation

http://annocpan.org/dist/Lingua-DE-Tagger

CPAN Ratings

http://cpanratings.perl.org/d/Lingua-DE-Tagger

Search CPAN

http://search.cpan.org/dist/Lingua-DE-Tagger

COPYRIGHT AND LICENCE

Copyright (C) 2008 Tobias Schulz

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.