Bio::Phylo::Parsers::Fasta - Parser used by Bio::Phylo::IO, no serviceable parts inside


Bio-Phylo documentation Contained in the Bio-Phylo distribution.

Index


Code Index:

NAME

Top

Bio::Phylo::Parsers::Fasta - Parser used by Bio::Phylo::IO, no serviceable parts inside

DESCRIPTION

Top

A very symplistic FASTA file parser. To use it, you need to pass an argument that specifies the data type of the FASTA records into the parse function, i.e.

 my $project = parse(
    -type   => 'dna', # or rna, protein
    -format => 'fasta',
    -file   => 'infile.fa',
    -as_project => 1
 );

For each FASTA record, the first "word" on the definition line is used as the name of the produced datum object. The entire line is assigned to:

 $datum->set_generic( 'fasta_def_line' => $line )

So you can retrieve it by calling:

 my $line = $datum->get_generic('fasta_def_line');

BioPerl actually parses definition lines to get GIs and such out of there, so if you're looking for that, use Bio::SeqIO from the bioperl-live distribution. You can always pass the resulting Bio::Seq objects to Bio::Phylo::Matrices::Datum->new_from_bioperl to turn the Bio::Seq objects that Bio::SeqIO produces into Bio::Phylo::Matrices::Datum objects.

SEE ALSO

Top

Bio::Phylo::IO

The fasta parser is called by the Bio::Phylo::IO object. Look there to learn more about parsing.

Bio::Phylo::Manual

Also see the manual: Bio::Phylo::Manual and http://rutgervos.blogspot.com

CITATION

Top

If you use Bio::Phylo in published research, please cite it:

Rutger A Vos, Jason Caravas, Klaas Hartmann, Mark A Jensen and Chase Miller, 2011. Bio::Phylo - phyloinformatic analysis using Perl. BMC Bioinformatics 12:63. http://dx.doi.org/10.1186/1471-2105-12-63

REVISION

Top

 $Id: Fasta.pm 1660 2011-04-02 18:29:40Z rvos $


Bio-Phylo documentation Contained in the Bio-Phylo distribution.
# $Id: Fasta.pm 1660 2011-04-02 18:29:40Z rvos $
package Bio::Phylo::Parsers::Fasta;
use strict;
use base 'Bio::Phylo::Parsers::Abstract';
use Bio::Phylo::Util::Exceptions 'throw';

sub _parse {
    my $self = shift;
    my $fh   = $self->_handle;
    my $fac  = $self->_factory;
    my $type = $self->_args->{'-type'}
      or throw 'BadArgs' => 'No data type specified!';
    my $matrix = $fac->create_matrix( '-type' => $type );
    my ( $seq, $datum );
    while (<$fh>) {
        chomp;
        my $line = $_;
        if ( $line =~ />(\S+)/ ) {
            my $name = $1;
            if ( $seq && $datum ) {
                $matrix->insert( $datum->set_char($seq) );
            }
            $datum = $fac->create_datum(
                '-type'    => $type,
                '-name'    => $name,
                '-generic' => { 'fasta_def_line' => $line }
            );
            $seq = '';
        }
        else {
            $seq .= $line;
        }
    }

# within the loop, insertions are triggered by encountering the next definition line,
# hence, the last $datum needs to be inserted explicitly when we leave the loop
    $matrix->insert( $datum->set_char($seq) );
    return $matrix;
}

# podinherit_insert_token

1;