Text::Shoebox::Entry - class for Shoebox SF lexicon entries


Text-Shoebox documentation Contained in the Text-Shoebox distribution.

Index


Code Index:

NAME

Top

Text::Shoebox::Entry - class for Shoebox SF lexicon entries

SYNOPSIS

Top

  use Text::Shoebox::Lexicon;
  my $lex = Text::Shoebox::Lexicon->read_file( "haida.sf" );

  foreach my $entry ($lex->entries) {
    #
    # Each $entry is a Text::Shoebox::Entry object
    #
    my %e = $entry->as_list;
    print "Looky, stuff! [", %e, "]!\n";
  }

DESCRIPTION

Top

An object of this class represents an entry in an SF lexicon (Text::Shoebox::Lexicon).

An entry consists of a number of fields. Each field has two scalars, a key, and a value. The first field in an entry is considered the headword field, and its key must occur there and only there in that entry. There is no requirement on uniqueness of keys in the rest of the entry.

METHODS

Top

$entry = Text::Shoebox::Entry->new();
$entry = Text::Shoebox::Entry->new( 'foo' => 'bar', 'baz' => 'quux' );

The new method creates a new Text::Shoebox::Entry method. If you provide parameters, as in the second example, those are used as the contents of the new object.

Normally you don't need to expressly create objects of this class, as Text::Shoebox::Lexicon will create them as needed when you call a Text::Shoebox::Lexicon read_file or read_handle method.

$entry2 = $entry->copy

This returns a copy of the object in $entry.

@keys = $entry->keys;

This returns the names of all the keys (a/k/a fieldnames) in this entry. For example, if $entry is ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff"), then $entry->keys will return the list ('hw', 'ex', 'ex').

@values = $entry->values;

This returns the values of all fields in this entry. For example, if $entry is ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff"), then $entry->values will return the list ('foo', 'Things', 'Stuff').

$headword = $entry->headword;

This returns the headword value of this entry. This is basically a shortcut for ($entry->values)[0]

$headword_field = $entry->headword_field;

This returns the fieldname of this entry's headword field. This is basically a shortcut for ($entry->keys)[0]

$value = $entry->_(keyname)

Yes, this method really is called _ !

This gets all the values of the pairs that have the key keyname. How it returns those values (of which there may be none, one, or many) depends on context:

In list context, this simply returns the list of found values. E.g., if $entry is ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff"), then $entry->_('hw') returns the one-item list ('foo'), then $entry->_('zorp') returns the zero-item list (), and $entry->_('ex') returns the two-item list ('Things', 'Stuff').

In scalar context, this returns undef if no values were found. Otherwise it returns a magical arrayref containing the list of (one or more) found values. What's special about these arrayrefs is that if you treat one as a plain string, you get the useful value join "; ", @$array_ref), instead of nonsense like "ARRAY(0x1555294)".

(Internally, this is implemented just like Array::Autojoin, which see.)

my $num_pairs = $entry->pair_count;

This returns an integer expressing the number of pairs in this entry. It's basically the same as scalar($entry->keys).

($key, $val) = $entry->pair($n)

This returns the key and value of pair number $n for this entry. E.g., if $entry is ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff"), then $entry->pair(1) is the list ('ex', 'Things').

($key, $val, $k2, $v2) = $entry->pairs($n, $m)

This returns the key and value of pair number $n and the key and value of pair number $m for this entry. E.g., if $entry is ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff"), then $entry->pair(0,2) is the list ('hw' = 'foo', 'ex', "Stuff")>.

(Actually, $entry->pair(...) is just an alias to $entry->pairs(...).)

$true_or_false = $entry->are_keys_unique;

This returns true iff every keyname in this entry appears only once.

$entry->assert_keys_unique;

This dies unless $entry->are_keys_unique is true.

$true_or_false = $entry->is_null

This returns true iff this entry is empty. This is basically the same as 0 == $entry->pair_count.

$true_or_false = $entry->is_sane

This returns true iff this entry is non-null, contains no references, and if no keyname is undef or zero-length.

$entry->scrunch;

For all values in this entry, this compacts all whitespace, deletes leading and trailing whitespace, and deletes any pairs where the value is blank. (Where "blank" means undef, zero-length, or is all-whitespace.)

$entry->dump;

This prints (not returns!) a representation of this object's contents.

@it = $entry->as_list

This returns a list expressing the contents of $entry. For example, if $entry is ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff"), then this returns just that, ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff").

$them = $entry->as_arrayref

This returns an arrayref (probably blessed) to the contents of $entry. For example, if $entry is ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff"), then this will return an arrayred to just that list.

Note that this (and as_HoLS) is like the other as_thing methods, in that this doesn't return any sort of copy; it returns a reference to the entry's array itself -- if you do $them->[1] = 'x', then $entry's contents change to ('hw' => 'x', 'ex' => 'Things', 'ex' => "Stuff").)

(Internally, this method is implemented by simply returning $entry itself, since in the current implementation, $entry is just a blessed arrayref to the (k,v,k,v,...) list it contains.)

$h = $entry->as_hashref

This returns a hashref expressing the contents of $entry as a {key => value,...} hash, discarding duplicates. For example, if $entry is ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff"), then this returns { 'hw' => 'foo', 'ex' => "Stuff" }.

$hol = $entry->as_HoL

This returns a reference to a hash-of-lists expressing the contents of this entry, i.e., a reference to a hash where every value is an arrayref. Note that this doesn't destroy duplicates.

For example, if $entry is ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff"), then this returns { 'hw' => ['foo'], 'ex' => ['Things', 'Stuff'] }.

And there's a useful bit of magic added -- the arrayrefs aren't just plain arrayrefs, they're special arrayrefs (implemented just like Array::Autojoin) such that if you treat one as a plain string, you get the useful value join "; ", @$array_ref), instead of nonsense like "ARRAY(0x1555294)".

$hol = $entry->as_HoLS

This returns a hashref where every value is a reference to an array of scalar-refs to the value-slots in $entry. This is so you can alter $entry. (This and $entry->as_arrayref are really the only ways to alter an entry objects's content.)

This sounds (and is) very circuitous, but it's like this: If $entry is ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff"), then this returns { 'hw' => [$fooslot], 'ex' => [$thingsslot, $stuffslot] }, where if you do $$thingslot = 'gack', then $entry then becomes ('hw' => 'foo', 'ex' => 'gack', 'ex' => "Stuff").

$hol = $entry->as_doublets

This returns this entry as a list of "doublets" -- i.e., as a list of two-item arrayrefs.

For example, if $entry is ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff"), then this returns the list (['hw','foo'], ['ex','Things'], ['ex','Stuff']).

$xml_source = $entry->as_xml()
$xml_source = $entry->as_xml( TagNameHash )

This returns an XML representation of this entry's contents. In short, For example, if $entry is ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff"), then this returns this string:

    <hw>foo</hw>
    <ex>Things</ex>
    <ex>Stuff</ex>

The only details arise from the problem of how to turn $entry's keynames into XML tag names. For each key, if it's present in the optional TagNameHash hashref parameter, then that value ($tagnamehash->{$keyname}) is used; otherwise, $keyname itself is used, stripped of characters other than a-zA-Z0-9, colon, underscore, period, and dash.

$xml_source = $entry->as_xml_pairs()
$xml_source = $entry->as_xml_pairs( PairTagName, KeyAttrName, ValAttrName)

This returns an XML representation of this entry's contents. In short, For example, if $entry is ('hw' => 'foo', 'ex' => 'Things', 'ex' => "Stuff"), then this returns this string:

    <pair key="hw" value="foo" />
    <pair key="ex" value="Things" />
    <pair key="ex" value="Stuff" />

This avoids the problem of how to turn keynames into XML tagnames. If you don't like the choice of the pair tagname (by default, "pair") or the key attribute name (by default, "key"), or the value attribute name (by default, "value"), then you can specify new values as the parameters. So if you call $entry->as_xml_pairs( 'fee' , 'fie', 'Foe:Fum'), the return value is:

    <fee fie="hw" Foe:Fum="foo" />
    <fee fie="ex" Foe:Fum="Things" />
    <fee fie="ex" Foe:Fum="Stuff" />

COPYRIGHT

Top

AUTHOR

Top

Sean M. Burke, sburke@cpan.org

I hasten to point out, incidentally, that I am not in any way affiliated with the Summer Institute of Linguistics.


Text-Shoebox documentation Contained in the Text-Shoebox distribution.
require 5;
package Text::Shoebox::Entry;
use strict;
use vars qw(@ISA $Debug $VERSION);
use integer;
use Text::Shoebox 1.02 ();
use Carp ();

$Debug = 0 unless defined $Debug;
BEGIN {
  $VERSION = "1.02";
}
my $Array_Class;

unless($Text::Shoebox::Lexicon::VERSION) { require Text::Shoebox::Lexicon; }

###########################################################################

sub new {  # Text::Shoebox is free to not use this, for speed's sake
  my $class = shift;
  $class = ref($class) || $class; # be an object or a class method
  print "New object in class $class\n" if $Debug;
  if(@_ == 1 and ref($_[0]) eq 'ARRAY') {
    # listref form -- call as: Text::Shoebox::Entry->new([foo => 'bar']);
    return bless $_[0], $class;
  } else {
    # list form -- call as: Text::Shoebox::Entry->new(foo => 'bar');
    return bless [@_], $class;
  }
}

sub copy {
  my $original = $_[0];
  Carp::croak("Text::Shoebox::entry is strictly an object method.")
    unless ref $original;
  return bless( [@$original], ref($original) );
  # bless into the same class as the original
  # presumably a deep copy isn't necessary!
}

#--------------------------------------------------------------------------

sub keys {
  my @out;
  for(my $i = 0; $i < @{$_[0]}; $i += 2) { push @out, $_[0][$i] }
  return @out;
}

sub values {
  my @out;
  for(my $i = 1; $i < @{$_[0]}; $i += 2) { push @out, $_[0][$i] }
  return @out;
}

#--------------------------------------------------------------------------


sub headword       { @{$_[0]} ? $_[0][1] : undef } # simply the first value
sub headword_field { @{$_[0]} ? $_[0][0] : undef } # simply the first key

sub Text::Shoebox::Entry::_ {
  my($self, $key) = @_;
  return unless defined $key;

  my @out;
  for(my $i = 0; $i < @$self; $i += 2) {
    push @out, $self->[$i+1] if $key eq $self->[$i];
  }
  if(wantarray) {
    return @out;
  } else {
    return unless @out;
    return bless \@out, $Array_Class;
  }
}



sub pair_count { return @{$_[0]} / 2; }

sub pair { (shift)->pairs(@_) } #alias

sub pairs { # also good for accessing one pair, or none!
  # get pair #3 (assuming counting from 0) :  ($k,$v) = $e->pairs(3);
  my $o = shift;
  map { @{$o}[$_ * 2, $_ * 2 + 1] } @_;
   # map to slices. Better be legal offsets!
   # e.g., 3 maps to @{$o}[6,7]
}

#--------------------------------------------------------------------------

sub are_keys_unique {
  # returns true iff the keys are unique in this entry.
  # i.e., if no headword occurs twice (or more)
  return 1 if @{$_[0]} < 2; # can't have collisions with just one key!

  my %seen;
  for(my $i = 0; $i < @{$_[0]}; $i += 2) {
    return 0 if $seen{$_[0][$i]}++;
  }
  return 1;
}

sub assert_keys_unique {
  return $_[0] if $_[0]->are_keys_unique;
  my $e = shift;
  my @k = $e->keys;
  my %seen;
  for my $k (@k) { ++$seen{$k} }
  for my $k (@k) { $k = uc $k if $seen{$k} > 1 }
  Carp::croak "Entry $e \"$$e[1]\" has duplicate keys: [@k]\nAborting";
}



sub is_null { return( @{$_[0]} == 0 ) }

sub is_sane {
  my $e = $_[0];
  return 0 unless @$e; # empty entries are not sane
  for(my $i = 0; $i < @{$_[0]}; $i += 2) { # scan keys
    return 0 unless defined $e->[$i] and length $e->[$i];
     # all keys have to be defined and be non-null
    return 0 if ref $e->[$i] or ref $e->[$i+1];
     # no references anywhere!
  }
  return 1;
}

#--------------------------------------------------------------------------

sub scrunch {
  my $e = $_[0];
  for(my $i = 1; $i < @$e; $i += 2) { # scan keys
    unless( defined $e->[$i] and $e->[$i] =~ m/\S/ ) {
      splice @$e, $i-1, 2;  # nix K=>V where V is null or all-whitespace
      $i-=2;
    }
    $e->[$i] =~ s/^\s+//s;
    $e->[$i] =~ s/\s+$//s;
    $e->[$i] =~ s/[ \t]*[\n\r]+[ \t]*/ /g;
     # remove newlines and any whitespace around them
  }
  return $e;
}

#--------------------------------------------------------------------------

sub dump {
  my $e = $_[0];

  print "Entry $e contains:\n";

  my $safe;
  my $toggle = 0;
  foreach my $v (@$e) {
    ($safe = $v) =~ 
            s<([^\x20\x21\x23\x27-\x3F\x41-\x5B\x5D-\x7E\xA1-\xFE])>
                          <$Text::Shoebox::p{$1}>eg;
    print(
      ($toggle ^= 1) ? qq{  $safe = } : qq{"$safe"\n} 
    );
  }
  print "\n";
  return $e;
}

#--------------------------------------------------------------------------

sub as_list     { return  @{$_[0]}  }
sub as_arrayref { return    $_[0]   }
sub as_hashref  { return {@{$_[0]}} }





sub as_HoL {
  my $e = $_[0];
  my %h;
  for(my $i = 0; $i < @$e; $i += 2) {
    push @{ $h{ $e->[$i] } ||= []}, $e->[$i+1];
  }
  foreach my $v (CORE::values %h) { bless $v, $Array_Class }
  \%h;
}


sub as_HoLS { # ref to a hash of list of refs to each of the value slots
  my $e = $_[0];
  my %h;
  for(my $i = 0; $i < @$e; $i += 2) {
    push @{
           $h{ $e->[$i] } ||= []
          },  \$e->[$i + 1];
  }
  \%h;
}



sub as_doublets {
  # returns this entry...
  #     (hw => 'shash', english => 'bear')
  # as this...
  #     ([hw => 'shash'], [english => 'bear'])
  my @out;
  for(my $i = 0; $i < @{$_[0]}; $i += 2) {
    push @out, [ @{ $_[0] }[$i, $i+1] ];
  }
  return @out;
}



sub as_xml {
  # Yes, VERY simpleminded.  And note that the result is NOT wrapped
  #  in an <entry>...</entry> or anything.

  # returns this entry...
  #     (hw => 'shash', english => 'bear')
  # as this...
  #     " <hw>shash</hw>\n<english>bear</english>\n"

  # Consider this entry more as a suggestion, and as a debugging tool, than
  #  anything else.

  # Optional first parameter: a reference to a hash mapping key names
  #  to tags.  E.g., $e->as_xml({hw => 'headword', english => 'gloss'})
  # will give you this:
  #     " <headword>shash</headword>\n<gloss>bear</gloss>\n"

  my $map = ref($_[1]) ? $_[1] : {};

  my(@out, $k, $v);
  for(my $i = 0; $i < @{$_[0]}; $i += 2) {
    ($k,$v) = @{$_[0]}[$i, 1 + $i];

    if(exists $map->{$k}){
      $k = $map->{$k};
    } else {
      # spiff up the key name so it's an okay GI (tag name)
      $k =~ tr<-._:a-zA-Z0-9><_>cd; # Yes, this is conservative
      if(length $k) {
        $k = '_' . $k unless $k =~ m<^[_:a-zA-Z]>s;
        # prefix unsafe things.
      } else { # to avoid a null GI
        $k = 'NULL';
      }
    }

    $v =~ s/&/&amp;/g;
    $v =~ s/</&lt;/g ;
    $v =~ s/>/&gt;/g ;
    push @out, " <$k>$v</$k>\n";
  }
  return join '', @out;
}

sub as_xml_pairs {
  # A bit less pointless.  And note that the result is still not wrapped
  #  in an <entry>...</entry> or anything.

  # Returns this entry...
  #     (hw => 'shash', english => 'bear')
  # as this...
  #     " <pair key="hw" value="shash" /><pair key="english" value="bear"/>\n"

  # Consider this entry more as a suggestion, and as a debugging tool, than
  #  anything else.

  # Calling format: $e->as_xml_pairs(TAGNAME, KEYNAME, VALUENAME)
  #  TAGNAME defaults to 'pair'.
  #  KEYNAME defaults to 'key'.
  #  VALUENAME defaults to 'value'.

  my($o, $gi, $key_name, $value_name) = @_;
  $gi         ||= 'pair'  ;
  $key_name   ||= 'key'   ;
  $value_name ||= 'value' ;

  my(@out, $k, $v);
  for(my $i = 0; $i < @$o; $i += 2) {
    ($k,$v) = @{$o}[$i, 1 + $i];
    foreach my $x ($k, $v) {
      # NB: Doesn't entitify apostrophes.  No point, really.
      $x =~ s/&/&amp;/g;
      $x =~ s/"/&quot;/g;
      $x =~ s/</&lt;/g;
      $x =~ s/>/&gt;/g;
      $x =~ s<([\n\t\cm\cj])>
                          <'&#'.(ord($1)).';'>seg;
       # turn newlines into character references
    }
    push @out, " <$gi $key_name=\"$k\" $value_name=\"$v\" />\n"
  }

  return join '', @out;
}

###########################################################################
{
  # Basically just the guts of Array::Autojoin:

  package Text::Shoebox::Entry::_Autojoin;
  $Array_Class = __PACKAGE__;

  use overload(

    '""' => sub {    join '; ', @{$_[0]}},

    '0+' => sub {0 + ( $_[0][0] || 0  ) },
     # stringifies and then just numerifies, but feh.

    'fallback' => 1,  # turn on cleverness

    'bool', => sub {  # true iff there's any true items in it
      for (@{$_[0]}) { return 1 if $_ };
      return '';
    },

    '.=' => sub {  # sure, why not.
      if(@{$_[0]}) { $_[0][-1] .= $_[1] } else { push @{$_[0]}, $_[1] }
      $_[0];
    },  # but can't overload ||= or the like

  );
}
###########################################################################
1;

__END__