Fry::Lib::BioPerl - Commandline enables common tasks for sequence and alignment objects for Bio::Perl modules.


Fry-Lib-BioPerl documentation Contained in the Fry-Lib-BioPerl distribution.

Index


Code Index:

NAME

Top

Fry::Lib::BioPerl - Commandline enables common tasks for sequence and alignment objects for Bio::Perl modules.

DESCRIPTION

Top

The main point of this library is to view,retrieve and create sequences and alignments. All objects created are indexed by the shell as Fry::Obj objects. See Fry::Lib::Default for commands that you can perform on these objects.

Sequence and alignment objects created are automatically named with an abbreviation followed by an incremental number ie seq1 and seq2 for sequences and aln1 and aln2 for alignments. Not all classes have unique objects. Bio::DB::GenBank only has one object and its getSeqById command assume this. This can of course be changed. Classes that have commands to create and index more than one of its objects are Bio::Seq derived classes, Bio::SimpleAlign, Bio::SeqIO and Bio::AlignIO.

DIRECTIONS

Top

To load this library read the Using Libraries subsection of Fry::Shell.

After loading this library, &_initLib will be called. This creates a Bio::SeqIO object for sequence output, a Bio::AlignIO for alignment output and reads in a sequence file to create Bio::Seq::RichSeq objects. The default sequence file assumes you're in the installation directory of a bio-perl bundle since it uses a file in t/. If any one the default actions don't work, just comment it out in &_initLib.

REQUIREMENTS

Top

You should install the bioperl-1.4 bundle. Time::HiRes and the bioperl-run bundle are used by some of the commands but comment them out if you don't have them and want to try other commands.

COMMANDS

Top

	Note: Brief explanation of what some names mean in a command's usage:
		seq- sequence object or its Fry::Obj id
		aln- a Bio::SimpleAlign object or its Fry::Obj id
		seqio- Bio::SeqIO object or its Fry::Obj id
		alnio- a Bio::AlignIO object or its Fry::Obj id

	Constructors: Creates objects to be used by other commands

		readSeq($file,$format,$object_name): Reads a sequence file, creates a
			Bio::SeqIO object and then creates and indexes its sequence objects.
		readAln($file,$format,$object_name): Reads an alignment file, creates a
			Bio::AlignIO object and then creates and indexes its alignment objects.
		startClustalw(%options): Creates a
			Bio::Tools::Run::Alignment::ClustalW object with given options.
			Default options are specified by %clustal_params.
		startTCoffee(%options): Creates a
			Bio::Tools::Run::Alignment::TCoffee object with given options.
			Default options are specified by %tcoffee_params.
		startDB(%options): Creates a Bio::DB::GenBank object with given
			options. Default options are defined by %db_genbank_params.

	IO
		alnioFormat($format): Changes format of alignment output.
		seqioFormat($format): Changes format of sequence output.

	Sequence based
		listSeqs($seqio): Displays sequence ids belonging to a file.
		writeSeq(@seq): Displays sequences via SeqIO.
			`listSeqs seq1 seq3`
		translateSeq($seq): Translates sequence, creating new sequence object.
		revcomSeq($seq): Reverse complements sequence, creating new sequence object.
		getSeqById($id): Calls &get_Seq_by_id on a Bio::DB::GenBank object to
			retrieve a sequence.

	General Alignment
		writeAln(@aln): Displays alignments via AlignIO.
		displayAln(@aln): Custom display of alignments.
		getAlnSeqs(@aln): Creates sequence objects from alignment objects.
		Alignment

	Special Alignment
		clustalAlign(@seq): Calls &align on ClustalW object, creating an alignment object.
		tcoffeeAlign(@seq): Calls &align on TCoffee object, creating an alignment object.
		compareAlign(@seq): Times alignments made by TCoffee and ClustalW and
			displays resulting alignments.
	Other
		libCmdAttr($cmd_attribute): See a command attribute for only :BioPerl
			commands.

MOTIVATION AND MAINTENANCE

Top

I wrote this library for a Bioinformatics class. Since I won't to be doing any bioinformatics in the forseeable future, it's unlikely I'll add more to this library. However, since I'm very interested in making a fully featured shell, more functionality will appear in this shell to make writing commands easier. Ideally, I'd like most of these library commands to be replaced by a config file.

If you want to expand this library, send me patches. I can also pass over ownership of this module. You can also write your own library. But beware that the library and scripting APIs are still changing.

SEE ALSO

Top

Fry::Shell, Bio::Perl

AUTHOR

Top

Me. Gabriel that is. I welcome feedback and bug reports to cldwalker AT chwhat DOT com . If you like using perl,linux,vim and databases to make your life easier (not lazier ;) check out my website at www.chwhat.com.

COPYRIGHT & LICENSE

Top


Fry-Lib-BioPerl documentation Contained in the Fry-Lib-BioPerl distribution.

package Fry::Lib::BioPerl;

use strict;
#use Benchmark qw/:all/;
use Time::HiRes qw/tv_interval gettimeofday/;
use Bio::SeqIO;
use Bio::AlignIO;
use Bio::Tools::Run::Alignment::Clustalw;
use Bio::Tools::Run::Alignment::TCoffee;
use Bio::DB::GenBank;
#use Bio::Graphics;
#Bio::Seq::RichSeq;
#Bio::Graphics::Feature;
	
our $VERSION = 0.15;
our $Verbose = 1;
$Bio::Root::Root::DEBUG = 0;

#these are defaults
our $seqio_file = "$ENV{HOME}/.cpanplus/5.8.1/build/bioperl-1.4/t/data/test.genbank";
#our $seqio_file = "t/data/test.genbank";
our $seqio_format = "genbank";
our $alnio_format =  "clustalw";
our $alnio_file = "$ENV{HOME}/.cpanplus/5.8.1/build/bioperl-1.4/t/data/testaln.aln";
#our $alnio_file = "t/data/testaln.aln";
our $default_seqio = "seqio1";
our $default_alnio = "alnio1";
our %clustal_params = (qw/matrix BLOSUM/);
our %tcoffee_params = ();
our %db_genbank_params = ();
my $seqcount = 0;
my $alncount = 0;
#hackish variable
our %temp;

sub _default_data {
	return {
		cmds=>{
			readSeq=>{qw/a brs/,d=>'Read a sequence from a file',
				u=>'$format$file$obj'},
			readAln=>{qw/a bra/,d=>'Read an alignment from a file',
				u=>'$format$file$obj'},
			startClustalw=>{qw/a bsc/,d=>'Initialize clustalw',u=>'%options'},
			startTCoffee=>{qw/a bst/,d=>'Initialize tcoffee',u=>'%options'},
			startDB=>{qw/a bsd/,d=>'Initialize database',u=>'%options'},
			clustalAlign=>{qw/a bca/,d=>'Align with clustalw',u=>'@seq'},
			tcoffeeAlign=>{qw/a bta/,d=>'Align with tcoffee',u=>'@seq'},	
			compareAlign=>{qw/a bcom/,d=>'Compare two alignment methods',
				u=>'@seq'},
			alnioFormat=>{qw/a baf/,d=>'Change alnioout format',u=>'$format'},
			seqioFormat=>{qw/a bsf/,d=>'Change seqioout format',u=>'$format'},
			listSeqs=>{qw/a bls/,d=>'Read all sequences from a file',
				u=>'$seqio'},
			#nextSeq=>{qw/a ,/,d=>'Get the next Seq'},
			writeAln=>{qw/a bwa/,d=>'Write alignments',u=>'@aln'},
			displayAln=>{qw/a bda/,d=>'Display alignment info',u=>'@aln'},
			getAlnSeqs=>{qw/a bas/,d=>'Save alignment seqs',u=>'$aln'},
			writeSeq=>{qw/a bws/,d=>'Write sequences',u=>'@seq'},
			translateSeq=>{qw/a bts/,d=>'Translate sequences',u=>'$seq'},
			revcomSeq=>{qw/a bts/,d=>'Reverse complement sequences',u=>'$seq'},
			getSeqById=>{qw/a bgsi/,d=>'Get sequence by id',u=>'$id'},
			#featureImage=>{qw/a bfi/,d=>''},
			libCmdAttr=>{qw/a lca/,d=>"Display a library's cmds' attributes",
				u=>'$attr$lib'},
		},
		objs=>{
			#seqio1=>{}
		}
	}	
}
sub _initLib {
	my ($cls) = @_;
	#reads in sequence file
	$cls->newSeqIoInObj;
	#sequence output
	$cls->newSeqIoObj('seqioout',-fh=>\*STDOUT,-format=>$seqio_format);
	#alignment output
	$cls->newAlnIoObj('alnioout',-fh=>\*STDOUT,-format=>$alnio_format);
}

#new*Obj
	#these methods create objects and index them as Fry::Obj objects as well
	sub newSeqIoInObj {
		my ($cls,%arg) = @_;

		#defaults
		my $objId = delete $arg{obj} || $default_seqio;
		$arg{-file} = $arg{-file} || $seqio_file;
		$arg{-format} = $arg{-format} || $seqio_format;

		$cls->newSeqIoObj($objId,%arg);

		#index seqs of seqio
		my ($id,@ids);
		my @newSeqs = $cls->ripStream($cls->obj->get($objId,'obj'));
		for (@newSeqs) { 
			$id = $cls->newSeqObj($_);
			push(@ids,$id);
		}
		$cls->obj->set($objId,seqs=>\@ids);
	}
	sub newSeqIoObj {
		my ($cls,$objId,%arg) = @_;
		#print Dumper \%arg;

		my $obj =  $cls->seqioNew(%arg) ; 
		$cls->obj->setOrMake($objId=>$obj);
	}
	sub newSeqObj {
		my ($cls,$obj) = @_;
		$seqcount++;
		my $id = "seq$seqcount";
		$cls->obj->manyNew($id=>{obj=>$obj}); 
		$cls->view("Created sequence '$id'\n") if ($Verbose);
		return $id;
	}
	sub newAlnObj {
		my ($cls,$obj) = @_;
		$alncount++;
		my $id = "aln$alncount";
		$cls->obj->manyNew($id=>{obj=>$obj}); 
		$cls->view("Created alignment '$id'\n") if ($Verbose);
		return $id;
	}
	sub newAlnIoObj {
		my ($cls,$objId,%arg) = @_;
		my $obj =  $cls->alnioNew(%arg) ; 
		$cls->obj->setOrMake($objId=>$obj);
	}
	sub newAlnIoInObj {
		my ($cls,%arg) = @_;
		my $objId = delete $arg{obj} || $default_alnio;
		$arg{-file} = $arg{-file} || $alnio_file;
		delete $arg{-format};
		#$arg{-format} = $arg{-format} || "clustalw";

		$cls->newAlnIoObj($objId,%arg);

		#index seqs of seqio
		my ($id,@ids);
		my @newAlns = $cls->ripStream($cls->obj->get($objId,'obj'));
		for (@newAlns) {
			$id = $cls->newAlnObj($_);
			push(@ids,$id);
		}
		$cls->obj->set($objId,alns=>\@ids);
	}
#wrappers around classes to easily substitute your own subclasses of any of these
	sub seqioNew { shift; return Bio::SeqIO->new(@_) }
	sub alnioNew { shift; return Bio::AlignIO->new(@_) }
	sub biodbNew { shift; return Bio::DB::GenBank->new(@_) }
	sub clustalwNew { shift; return Bio::Tools::Run::Alignment::Clustalw->new(@_) }
	sub tcoffeeNew { shift; return Bio::Tools::Run::Alignment::TCoffee->new(@_) }
	sub objObj { shift->obj->get($_[0],'obj') }
#commands
	#h: will probably break in other releases
	sub libCmdAttr {
		my ($cls,$attr) = @_;
		my $lib = "Fry::Lib::BioPerl";

		my @ids =  @{$cls->lib->get($lib,'cmds') };
		$cls->printGeneralAttr('cmd',$attr,@ids);
	}
#constructors
	sub readSeq {
		#d: initializes $seqio
		my ($cls,$format,$file,$objId) = @_;
		$cls->newSeqIoInObj(-file=>$file,-format=>$format,obj=>$objId);
	}
	sub readAln {
		my ($cls,$format,$file,$objId) = @_;
		$cls->newAlnIoInObj(-file=>$file,-format=>$format,obj=>$objId);
	}
	sub startClustalw {
		my ($cls,%arg) = @_;
		#$cls->verify_params(\%arg);
		my %params = (%clustal_params,%arg);
		my $obj = $cls->clustalwNew(%params);
		$cls->obj->manyNew('cw'=>{obj=>$obj}); 
	}
	sub startDB {
		my ($cls,%arg) = @_;
		my %params = (%db_genbank_params,%arg);
		my $obj = $cls->biodbNew(%params);
		$cls->obj->manyNew('db1'=>{obj=>$obj}); 
	}
	sub startTCoffee {
		my ($cls,%arg) = @_;
		my %params = (%tcoffee_params,%arg);
		my $obj = $cls->tcoffeeNew(%params);
		$cls->obj->manyNew('tc'=>{obj=>$obj}); 
	}
#Alignment
	##$tcoffee
	sub tcoffeeAlign {
		my ($cls,@seqs) = @_;
		@seqs = $cls->idToObj(@seqs);

		my $t0= [ gettimeofday()];
		my $alnobj = $cls->obj->get('tc','obj')->align(\@seqs);
		my $elapsed = tv_interval($t0);
		$cls->newAlnObj($alnobj);
		$temp{tc} = {time=>$elapsed,alnid=>"aln$alncount"};

		$cls->view("This alignment took $elapsed seconds\n") if $Verbose;
	}
	##$clustalw
	sub clustalAlign {
		my ($cls,@seqs) = @_;
		@seqs = $cls->idToObj(@seqs);

		my $t0= [ gettimeofday()];
		my $alnobj = $cls->obj->get('cw','obj')->align(\@seqs);
		my $elapsed = tv_interval($t0);
		$cls->newAlnObj($alnobj);
		$temp{cw} = {time=>$elapsed,alnid=>"aln$alncount"};

		$cls->view("This alignment took $elapsed seconds\n") if $Verbose;
	}
	##macros
	sub compareAlign {
		my ($cls,@seqs) = @_;
		@seqs = $cls->idToObj(@seqs);

		$cls->clustalAlign(@seqs);
		$cls->tcoffeeAlign(@seqs);

		$cls->view("Alignment from Clustalw\n");
		$cls->view("------------------------------------\n");
		$cls->displayAln($temp{cw}{alnid});
		$cls->view("Time: ",$temp{cw}{time},"\n");
		$cls->view("\nAlignment from TCoffee\n");
		$cls->view("------------------------------------\n");
		$cls->displayAln($temp{tc}{alnid});
		$cls->view("Time: ",$temp{tc}{time},"\n");
	}
#IO
	##$alignio
	sub alnioFormat {
		my ($cls,$format) = @_;
		$format ||= $alnio_format;
		$cls->newAlnIoObj('alnioout',-fh=>\*STDOUT,-format=>$format);
	}
	##$seqio
	sub seqioFormat {
		my ($cls,$format) = @_;
		$format ||= $seqio_format;
		$cls->newSeqIoObj('seqioout',-fh=>\*STDOUT,-format=>$format);
	}
#Sequences
	##$biodb
	sub getSeqById {
		my ($cls,$id) = @_;
		my $method = "get_Seq_by_id";
		my $seq = $cls->objObj('db1')->$method($id);
		$cls->newSeqObj($seq);
	}
	##$seq
	sub listSeqs {
		#d: readSeqs
		my ($cls,$seqioId) = @_;
		$seqioId ||= $default_seqio;
		my @seqs = @{$cls->obj->get($seqioId,'seqs') || []};
		my $output = "Sequence ids of $seqioId are:\n";
		$output .= $cls->seqIds(map {$cls->obj->get($_,'obj')} @seqs);
		$cls->view($output);
	}
	sub translateSeq {
		my ($cls,$seqId) = @_;
		my $seqObj = $cls->obj->get($seqId,'obj')->translate;
		$cls->newSeqObj($seqObj);
	}
	sub revcomSeq {
		my ($cls,$seqId) = @_;
		my $seqObj = $cls->obj->get($seqId,'obj')->revcom;
		$cls->newSeqObj($seqObj);
	}
	sub writeSeq {
		my ($cls,@seqs) = @_;
		@seqs = $cls->idToObj(@seqs);
		for my $seq (@seqs) {
			$cls->obj->get(seqioout=>'obj')->write_seq($seq);
		}
	}
#Alignments
	##$aln
	#add,remove,indexSeq,slice,remove_gaps,map_chars
	sub displayAln {
		my ($cls,@alns) = @_;
		@alns = $cls->idToObj(@alns);
		my %methods = (score=>'Score',consensus_string=>'Consensus string',
			length=>'Length',no_residues=>'Number of residues',
			no_sequences=>'Number of sequences',percentage_identity=>'Percentage identity');
		my $output;

		for my $aln (@alns) {
			for my $m (sort keys %methods) {
				$output .= $methods{$m}.": ". $aln->$m
				."\n";
			}
		}
		$cls->view($output);
	}
	sub getAlnSeqs {
		my ($cls,$alnId) = @_;
		my $aln = $cls->objObj($alnId);
		my @ids;

		for my $seq ($aln->each_seq) { push(@ids,$cls->newSeqObj($seq)) }
		#for my $seq ($aln->each_seq) {
			#my $obj = Bio::Seq->new(-display_id=>$seq->{display_id},-seq=>$seq->{seq});
			##print "s: ",$seq->{seq},"\n";
			#push(@ids,$cls->newSeqObj($obj)) 
		#}
		$cls->obj->set($alnId,seqs=>\@ids);
	}
	sub writeAln {
		my ($cls,@alns) = @_;
		@alns = $cls->idToObj(@alns);
		for my $aln (@alns) {
			$cls->obj->get(alnioout=>'obj')->write_aln($aln);
		}
	}
	##misc
	sub seqIds {
		my ($cls,@seqs) = @_;
		my $output;

		if (@seqs > 0) {
			for my $seq  (@seqs) {
				$output .= sprintf "%s\n", $seq->display_id;
			}
		}
		else { $output = "No sequences to display." }
		return $output;
	}
	sub idToObj {
		my ($cls,@idOrObj) = @_;
		my @return;
		if (ref $idOrObj[0]) {
			@return = @idOrObj;
		}
		else { @return = map { $cls->obj->get($_,'obj') } @idOrObj }
		wantarray ? @return : $return[0];
	}
	#d: all Bio::*IO can use this
	sub ripStream { my $fh = $_[1]->fh; my @seqs = <$fh>; return @seqs }
1;
__END__
#my old crap code
	sub nextSeq {
		my ($cls,$seqio) = @_;

		#td?: history of used seq
		$seqio ||= $default_seqio;
		my $seq = $cls->obj->get($seqio,'obj')->next_seq or do {warn("failed next_seq(): $@") ; return};
		#my $seq = $seqio->next_seq or do {warn("failed next_seq(): $@") ; return};
		#print $seq->seq,"\n"; $out->write_seq($seq);
		$cls->writeSeq($seq);
	}
	sub parseArgs { 
		my $cls = shift;
		my ($seqio1,$seqio2);
		if (ref $_[0] =~ /Bio::/) {
		}
		#elsif ($_[0] =~ /(file|format)[12]/) {
		#}
		else { ($seqio1,$seqio2) = ($in,$out) }
		return ($seqio1,$seqio2);
	}
	sub featureImage {
		my ($cls,$seq) = @_;
		$seq = $cls->idToObj($seq);
 my @features = $seq->all_SeqFeatures;

 # sort features by their primary tags
 my %sorted_features;
 for my $f (@features) {
   my $tag = $f->primary_tag;
   push @{$sorted_features{$tag}},$f;
 }

 my $wholeseq = Bio::SeqFeature::Generic->new(-start=>1,-end=>$seq->length);

 my $panel = Bio::Graphics::Panel->new(
				      -length    => $seq->length,
 				      -key_style => 'between',
 				      -width     => 800,
 				      -pad_left  => 10,
 				      -pad_right => 10,
 				      );
 $panel->add_track($wholeseq,
 		  -glyph => 'arrow',
 		  -bump => 0,
 		  -double=>1,
 		  -tick => 2);

 $panel->add_track($seq,
 		  -glyph  => 'generic',
 		  -bgcolor => 'blue',
 		  -label  => 1,
 		 );

 # general case
 my @colors = qw(cyan orange blue purple green chartreuse magenta yellow aqua);
 my $idx    = 0;
 for my $tag (sort keys %sorted_features) {
   my $features = $sorted_features{$tag};
   $panel->add_track($features,
 		    -glyph    =>  'generic',
 		    -bgcolor  =>  $colors[$idx++ % @colors],
 		    -fgcolor  => 'black',
 		    -font2color => 'red',
 		    -key      => "${tag}s",
 		    -bump     => +1,
 		    -height   => 8,
 		    -label    => 1,
 		    -description => 1,
 		   );
 }
	}