genomics - Perl extension for various DNA sequence analysis tools


genomics documentation  | view source Contained in the genomics distribution.

Index


NAME

Top

genomics - Perl extension for various DNA sequence analysis tools

SYNOPSIS

Top

  use genomics::FilterSeq;

DESCRIPTION

Top

This module condenses a fasta formated file to a 'unique' list of sequences. This is done rcursively by Hash{key} lookups. A unique key is sampled from each sequence and listed in a %HASH, thereby making all seqeucnes with identcal keys equivelent. The sequences are scanned +- the scanning window for other keys. Duplicates are squashed based on key prevelence or 5'->3' directionality. =head2 EXPORT Usage: Call the subroutine by sending in order: 1. \%SEQUENCE - a reference to a hash with %SEQUENCE{$name}=$sequence structure 2. $filter_start - the staring position in the sequence to gab a key 3. $filter_length - the length of the key (shorter keys produce more 'pruned' sets) 4. $filter_window - window +- to scan for keys 5. $filter_type - "M" = leave ambigous sequences, "T" = force ambigous to most 3' position, "F" = force ambigous to most 5' position

my ( $RefKeyHash_R,$RefKeyHashSeq_R,$EST_PER_SITE_R,$SITES_CHOSEN_R,$STATS_R )= genomics::FilterSeq(\%SEQUENCE,$filter_start,$filter_length,$filter_window,$filter_type);

subroutine returs the following: 1. $RefKeyHash_R - hash_reference to hash containing references to arrays with sequence names by key. [ %hash{$key}=@ref_to_names ] 2. $RefKeyHashSeq_R, - similar, only returns condensed sequence by key 3. $EST_PER_SITE_R, a reference to a hash containg the key count value (number of keys represented) 4. $SITES_CHOSEN_R, a reference to a hash containg the key count value (number of sites represented) 5. $STATS_R reference to a hash of various counts.

my $seq_count = $$STATS_R{"seq_count"}; my $Refseq_ID_count = $$STATS_R{"Refseq_ID_count"}; my $position_squashed_count = $$STATS_R{"position_squashed_count"}; my $key_count = $$STATS_R{"key_count"}; my $my_length_ave = $$STATS_R{"length_ave"};

print "Out of $seq_count sequences ($my_length_ave), $Refseq_ID_count Id's were placed into $position_squashed_count sites (exact key), further reduced to $key_count sites by positional iteratation<BR>\n";

foreach(keys(%$RefKeyHash_R)){ print "$_ "; my $my_name_arr = $$RefKeyHash_R{$_}; print @$my_name_arr; print "\n"; print ${$$RefKeyHashSeq_R{$_}}; print "\n"; }

SEE ALSO

Top

Mention other useful documentation such as the documentation of related modules or operating system documentation (such as man pages in UNIX), or any relevant external documentation such as RFCs or standards.

AUTHOR

Top

ltboots, <jesse.salisbury@cpan.org<gt>

COPYRIGHT AND LICENSE

Top


genomics documentation  | view source Contained in the genomics distribution.