| Bio-Phylo documentation | Contained in the Bio-Phylo distribution. |
Bio::Phylo::Parsers::Fasta - Parser used by Bio::Phylo::IO, no serviceable parts inside
A very symplistic FASTA file parser. To use it, you need to pass an argument that specifies the data type of the FASTA records into the parse function, i.e.
my $project = parse(
-type => 'dna', # or rna, protein
-format => 'fasta',
-file => 'infile.fa',
-as_project => 1
);
For each FASTA record, the first "word" on the definition line is used as the name of the produced datum object. The entire line is assigned to:
$datum->set_generic( 'fasta_def_line' => $line )
So you can retrieve it by calling:
my $line = $datum->get_generic('fasta_def_line');
BioPerl actually parses definition lines to get GIs and such out of there, so if you're looking for that, use Bio::SeqIO from the bioperl-live distribution. You can always pass the resulting Bio::Seq objects to Bio::Phylo::Matrices::Datum->new_from_bioperl to turn the Bio::Seq objects that Bio::SeqIO produces into Bio::Phylo::Matrices::Datum objects.
The fasta parser is called by the Bio::Phylo::IO object. Look there to learn more about parsing.
Also see the manual: Bio::Phylo::Manual and http://rutgervos.blogspot.com
If you use Bio::Phylo in published research, please cite it:
Rutger A Vos, Jason Caravas, Klaas Hartmann, Mark A Jensen and Chase Miller, 2011. Bio::Phylo - phyloinformatic analysis using Perl. BMC Bioinformatics 12:63. http://dx.doi.org/10.1186/1471-2105-12-63
$Id: Fasta.pm 1660 2011-04-02 18:29:40Z rvos $
| Bio-Phylo documentation | Contained in the Bio-Phylo distribution. |
# $Id: Fasta.pm 1660 2011-04-02 18:29:40Z rvos $ package Bio::Phylo::Parsers::Fasta; use strict; use base 'Bio::Phylo::Parsers::Abstract'; use Bio::Phylo::Util::Exceptions 'throw';
sub _parse { my $self = shift; my $fh = $self->_handle; my $fac = $self->_factory; my $type = $self->_args->{'-type'} or throw 'BadArgs' => 'No data type specified!'; my $matrix = $fac->create_matrix( '-type' => $type ); my ( $seq, $datum ); while (<$fh>) { chomp; my $line = $_; if ( $line =~ />(\S+)/ ) { my $name = $1; if ( $seq && $datum ) { $matrix->insert( $datum->set_char($seq) ); } $datum = $fac->create_datum( '-type' => $type, '-name' => $name, '-generic' => { 'fasta_def_line' => $line } ); $seq = ''; } else { $seq .= $line; } } # within the loop, insertions are triggered by encountering the next definition line, # hence, the last $datum needs to be inserted explicitly when we leave the loop $matrix->insert( $datum->set_char($seq) ); return $matrix; } # podinherit_insert_token
1;