Bio::FASTASequence - Perl extension for Bioinformatics. Parsing sequence informations.


Bio-FASTASequence documentation  | view source Contained in the Bio-FASTASequence distribution.

Index


NAME

Top

Bio::FASTASequence - Perl extension for Bioinformatics. Parsing sequence informations.

SYNOPSIS

Top

  use Bio::FASTASequence;
  my $fasta = qq~>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).
QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKY
YNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS
  ~;
  my $seq = Bio::FASTASequence->new($fasta);

ABSTRACT

Top

  Bio::FASTASequence is a perl module to parse information out off a Fasta-Sequence.

DESCRIPTION

Top

This perl module is a simple utility to simplify the job of bioinformatics. It parses several information about a given FASTA-Sequence such as:

* accession number
* description
* sequence itself
* length of sequence
* crc64 checksum (as it is used by SWISS-PROT)
* seq2xml

METHODS

new

getAccessionNr

	my $accession = $seq->getAccessionNr();

returns the AccessionNr of the FASTA-Sequence

getDescription

	my $description = $seq->getDescription();

returns the description standing in the first line of the FASTA-format (without the accession number)

getSequence

	my $sequence = $seq->getSequence();

returns the sequence

getCrc64

	my $crc64_checksum = $seq->getCrc64();

returns the crc64 checksum of the sequence. This checksum corresponds with the crc64 checksum of SWISS-PROT

addDBRef

	$seq->addDBRef(DB, REFERENCE_AC);

DB is the name of the referenced database

REFERENCE_AC is the accession number in the referenced database

seq2file

	$seq->seq2file(FILENAME);

FILENAME is the path of the file where the sequence has to be stored.

allIndexesOf

	my $indexes = $seq->allIndexesOf(EXPR);

returns a reference on an array, which contains all indexes of EXPR in the sequence

getSequenceLength

	my $length = $seq->getSequenceLength();

returns the length of the sequence

getDBRefs

	my $hashref = $seq->getDBRefs();

returns a hashreference. The hash contains all references hashref = {'SWISS-PROT' => 'P01815'},

getFASTA

	my $fasta_sequence = $seq->getFASTA();

returns the sequence in FASTA-format

EXAMPLE

	use Bio::FASTASequence;
	my $fasta = qq~>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).
	QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKY
	YNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS
	~;

	my $seq = Bio::FASTASequence->new($fasta);

	print 'The sequence of '.$seq->getAccessionNr().' is '.$seq->getSequence(),"\n";
	print 'This sequence contains '.scalar($seq->allIndexesOf('C').' times Cystein at the following positions:';
	print $_+1.', ' for(@{$seq->allIndexesOf('C')});

ADDITIONAL INFORMATION

Top

accepted formats

This module can parse the following formats:

>P02656 APC3_HUMAN Apolipoprotein C-III precursor (Apo-CIII).
>IPI:IPI00166553|REFSEQ_XP:XP_290586|ENSEMBL:ENSP00000331094|TREMBL:Q8N3H0 T Hypothetical protein
>sp|P01815|HV2B_HUMAN Ig heavy chain V-II region COR - Homo sapiens (Human).

structure

The structure of the hash for the example is:

	$VAR1 = {
	         'seq_length' => 120,
	         'accession_nr' => 'P01815',
	         'text' => 'QVTLRESGPALVKPTQTLTLTCTFSGFSLSSTGMCVGWIRQPPGKGLEWLARIDWDDDKYYNTSLETRLTISKDTSRNQVVLTMDPVDTATYYCARITVIPAPAGYMDVWGRGTPVTVSS',
	         'crc64' => '158A8B29AE7EEB98',
	         'dbrefs' => {},
	         'description' => 'Ig heavy chain V-II region COR - Homo sapiens (Human).'
	       }

if you miss something please contact me.

BUGS

Top

There is no bug known. If you experienced any problems, please contact me.

SEE ALSO

Top

http://modules.renee-baecker.de # not available yet - this site is under construction

the crc64-routine is based on the SWISS::CRC64 module.

MODIFICATIONS

Top

More FASTA-Description lines are accepted.

AUTHOR

Top

Renee Baecker, <module@renee-baecker.de>

feel free to contact me.

COPYRIGHT AND LICENSE

Top


Bio-FASTASequence documentation  | view source Contained in the Bio-FASTASequence distribution.