Lingua::LinkParser - Perl module implementing the Link Grammar Parser by Sleator, Temperley and Lafferty at CMU.


Lingua-LinkParser documentation  | view source Contained in the Lingua-LinkParser distribution.

Index


NAME

Top

Lingua::LinkParser - Perl module implementing the Link Grammar Parser by Sleator, Temperley and Lafferty at CMU.

SYNOPSIS

Top

  use Lingua::LinkParser;

  our $parser = new Lingua::LinkParser;
  my $sentence = $parser->create_sentence("This is the turning point.");
  my @linkages = $sentence->linkages;
  # If there are NO LINKAGES, set min_null_count to a positive number:
  # $parser->opts('min_null_count' => 1);
  # See scripts/parse.pl for examples.
  foreach $linkage (@linkages) {
      print ($parser->get_diagram($linkage));
  }

DESCRIPTION

Top

To quote the Link Grammar documentation, "the Link Grammar Parser is a syntactic parser of English, based on link grammar, an original theory of English syntax. Given a sentence, the system assigns to it a syntactic structure, which consists of set of labeled links connecting pairs of words."

This module provides acccess to the parser API using Perl objects to easily analyze linkages. The module organizes data returned from the parser API into an object hierarchy consisting of, in order, sentence, linkage, sublinkage, and link. If this is unclear to you, see the several examples in the 'eg/' directory for a jumpstart on using these objects. The current Lingua::LinkParser module is based on version 4.2 of the Link Grammar parser API.

The objects within this module should not be confused with the types familiar to users of the Link Parser API. The objects used in this module reorganize the API data in a way more usable and friendly to Perl users, and do not exactly represent the types used in the API. For example, an object of class"Lingua::LinkParser::Sentence does not directly correspond to the struct type "Sentence" of the API; rather, it is a Perl object that provides methods to access the underlying API functions.

This documentation should be supplemented with the extensive texts included with the Link Parser and on the Link Parser web site in order to understand its vernacular and general usage. Lingua::LinkParser::Definitions stores the basic link type documentation, and allows in-program retrieval of this information for convenience.

Note that most of the objects have overloading behavior, such that if you print an object, you will see a sensible text representation of that object, such as a linkage diagram.

$parser = new Lingua::LinkParser( Lang => "en" )

This returns a new Lingua::LinkParser object, loads dictionary files, and sets basic configuration. This constructor no longer takes a full path to the dictionary files; they are expected to exist in the locations standard to the 4.2 parser distribution.

$parser->opts(OPTION_NAME => OPTION_VALUE, ...)

This sets the parser option OPTION_NAME to the value specified by OPTION_VALUE. A full list of these options is found at the end of this document, as well as in the Link Parser distribution documentation.

$sentence = $parser->create_sentence(TEXT)

Creates and assigns a sentence object (Lingua::LinkParser::Sentence) using the supplied value. This object is used in subsequent creation and analysis of linkages.

$sentence->length

Returns the number of words in the tokenized sentence, including the boundary words and punctuation.

$sentence->num_linkages

Returns the number of linkages found for $sentence.

$sentence->num_valid_linkages

Returns the number of valid linkages for $sentence

$sentence->num_linkages_post_processed

Returns the number of linkages that were post-processed.

$sentence->null_count

Returns the number of null links used in parsing the sentence.

$sentence->num_violations

Returns the number of post processing violations for $sentence.

$sentence->get_word(NUM)

Returns the word (with original spelling) at position NUM, which is 1-indexed.

$linkage = $sentence->linkage(NUM)

Assigns a linkage object (Lingua::LinkParser::Linkage) for linkage NUM of sentence $sentence. NUM is 1-indexed.

@linkages = $sentence->linkages

Assigns a list of linkage objects for all linkages of $sentence.

$linkage->num_words

Returns the number of words within $linkage.

$linkage->get_words

Returns a list of words within $linkage

$linkage->words

Returns a list of ::Word objects for $linkage.

$linkage->num_sublinkages

Returns the number of sublinkages for linkage $linkage.

$linkage->compute_union

Combines the sublinkages for $linkage into one, possibly with crossing links.

$linkage->violation_name

Returns the name of a rule violated by post-processing of the linkage.

$linkage->constituent_tree

Returns a Perl data structure that represents the constituent tree for the linkage. See scripts/constituent-tree.pl for an example of processing the tree.

$sublinkage = $linkage->sublinkage(NUM)

Assigns a sublinkage object (Lingua::LinkParser::Linkage::Sublinkage) for sublinkage NUM of linkage $linkage, which is 1-indexed.

@sublinkages = $linkage->sublinkages

Assigns an array of sublinkage objects.

$sublinkage->get_word(NUM)

Returns the word for the sublinkage at position NUM, 1-indexed.

$sublinkage->words

Returns a list of ::Word objects for $sublinkage.

Returns the number of links for sublinkage $sublinkage.

$word->text

Returns the post-parse word text.

$word->position

Returns the number for the word's position in a sentence.

Returns a list of link objects for the word.

Assigns a link object (Lingua::LinkParser::Link) for link NUM of sublinkage $sublinkage. NUM is 1-indexed.

Assigns an array of link objects.

Returns the number of domains for the sublinkage.

Returns a list of the domain names for $link.

Returns the "intersection" label for $link.

Returns the left label for $link.

Returns the right label for $link.

Returns the number of the left word for $link.

Returns the number of the right word for $link.

Returns the length of the link.

Only for link objects created via a word object, this returns the label for the link from the word object that created it.

Only for link objects created via a word object, this returns the word text which the link points *to* from the object that created it.

Only for link objects created via a word object, this returns the number of the word which the link points *to* from the object that created it.

$parser->get_diagram($linkage)

Returns an ASCII pretty-printed diagram of the specified linkage or sublinkage.

$parser->get_postscript($linkage, MODE)

Returns Postscript code for a diagram of the specified linkage or sublinkage.

$parser->get_domains($linkage)

Returns formatted ASCII text showing the links and domains for the specified linkage or sublinkage.

$parser->print_constituent_tree($linkage, MODE)

Returns an ASCII formatted tree displaying the constituent parse tree for $linkage. MODE is an integer with the following meanings: '1' will display the tree using a nested Lisp format, '2' specifies that a flat tree is displayed with brackets, and '0' results in no structure, a null string being returned.

OTHER FUNCTIONS

Top

A few higher-level functions have also been provided.

@bigstruct = $sentence->get_bigstruct

Assigns a potentially large data structure merging all linkages/sublinkages/links for $sentence. This structure is an array of hashes, with a single array entry for each word in the sentence. This function is only useful for high-level analysis of sentence grammar; most applications should be served by using the above functions.

This array has the following structure:

 @bigstruct = ( %{ 'word'  => 'WORD',
                 'links' => %{
                    'LINKTYPE_LINKAGENUM' => 'TARGETWORDNUM',...
                 },
                }
           , ...);

Where LINKAGENUM is the number of the linkage for $sentence, and LINKTYPE is the link type label. TARGETWORDNUM is the number of the word to which each link connects.

get_bigstruct() can be useful in finding, for example, all links for a given word in a given sentence:

   $sentence = $parser->create_sentence(
        "Architecture is present in nearly every civilized society.");
   @bigstruct = $sentence->get_bigstruct;

   print "\nword 8: ", $bigstruct[8]->{word}, "\n";

   while (($k,$v) = each %{$bigstruct[8]->{links}} )
        { print " $k => ", $bigstruct[$v]->{word}, "\n"; }

This would output:

   word 8: society.n
    Dsu => every.d
    Jp => in
    A => civilized.a

Signifying that for word "society", links are found of type A (pre-noun adjective) with "civilized" (tagged 'a' for adjective), type Jp (preposition to object) with "in", and type Dsu (noun determiner, singular-mass) with word "every", which is tagged 'd' for determiner.

The following example adds the usage of a Lingua::LinkParser::Definitions object to display the link definitions along with the link types. Note that this is an optional module, and is only really useful for human-readable display:

   use Lingua::LinkParser::Definitions qw(define);

   $sentence = $parser->create_sentence(
        "Architecture is present in nearly every civilized society.");
   @bigstruct = $sentence->get_bigstruct;

   print "\nword $i: ", $bigstruct[$i]->{word}, "\n";

   while (($k,$v) = each %{$bigstruct[$i]->{links}} )
        { print " $k => ", $bigstruct[$v]->{word}, " (", define($k), ")\n"; }

Yielding:

   word 8: society.n
    Dsu => every.d (D connects determiners to nouns: "THE DOG chased A CAT and SOME BIRDS".  )
    Jp => in (J connects prepositions to their objects: "The man WITH the HAT is here".  )
    A => civilized.a (A connects pre-noun ("attributive") adjectives to following nouns: "The BIG DOG chased me", "The BIG BLACK UGLY DOG chased me".)

LINK PARSER OPTIONS

Top

AUTHOR

Top

Danny Brian, danny@brians.org

SEE ALSO

Top

http://www.abisource.com/projects/link-grammar/ http://www.link.cs.cmu.edu/link/.


Lingua-LinkParser documentation  | view source Contained in the Lingua-LinkParser distribution.