Lingua::PT::ProperNames - Simple module to extract proper names from Portuguese Text


Lingua-PT-ProperNames documentation  | view source Contained in the Lingua-PT-ProperNames distribution.

Index


NAME

Top

Lingua::PT::ProperNames - Simple module to extract proper names from Portuguese Text

Version

Top

Version 0.09

Synopsis

Top

This module contains simple Perl-based functions to detect and extract proper names from Portuguese text.

  use Lingua::PT::ProperNames;




  printPN(@options);
  printPNstring({ %options... } ,$textstrint);
  printPNstring([ @options... ] ,$textstrint);

  forPN( sub{my ($pn, $contex)=@_;... } ) ;
  forPN( {t=>"double"},
         sub{my ($pn, $contex)=@_;... }, sub{...} ) ;
  $outstr = forPN($instr, sub{my ($pn, $contex)=@_;... }, ... ) ;

  forPNstring(sub{my ($pn, $contex)=@_;... },
         $textstring, regsep) ;




  my $pndict = Lingua::PT::ProperNames->new;

ProperNames dictionary

Top

new

Creates a new ProperNames dictionary

is_name

This method checks if a name exists in the Names dictionary.

is_surname

Thie method checks if a name exists in the Names dictionary as a Surname.

Export the following functions

Top

forPN

Substitutes all propername by <funref-($propername,$context)>> in STDIN and sends output to STDOUT

Usage:

   forPN({options...}, sub{ propername processor...})

Optionally you can define input or output files:

   forPN({in=> "inputfile", out => "outputfile" }, sub{...})

Optionally you can use option type : <{t = "double"}>> to have special treatment for process names after pontuation (".", etc). With this options you must provide 2 functions: one for normal propernames and one for names after pontuation.

   forPN({t=>"double"}, sub{...}, sub{...})

You can also define record paragraph separator

   forPN({sep=>"\n", t=>"normal"}, sub{...}) ## each line is a par.
   forPN({sep=>""}, sub{...})                ## par. empty lines

forPNstring

   forPNstring( $funref, "textstring" [, regSeparator] )>

Substitutes all propername by funref(propername) in the text string.

printPNstring

   printPNstring("oco")

getPN

printPN

  printPN("oco")

  printPN  - extrai os nomes próprios dum texto.
   -comp    junta certos nomes: Fermat + Pierre de Fermat = (Pierre de) Fermat
   -prof
   -e       "Sebastiao e Silva" "e" como pertencente a PN
   -em      "em Famalicão" como pertencente a PN




Author

Top

José João Almeida, <jj@di.uminho.pt>

Alberto Simões, <ambs@di.uminho.pt>

Bugs

Top

NOTE: We know documentation for exported methods is inexistent. We are working on that for very soon.

Please report any bugs or feature requests to bug-lingua-pt-propernames@rt.cpan.org, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE

Top


Lingua-PT-ProperNames documentation  | view source Contained in the Lingua-PT-ProperNames distribution.