Bio::ASN1::EntrezGene::Indexer - Indexes NCBI Entrez Gene files.


Bio-ASN1-EntrezGene documentation  | view source Contained in the Bio-ASN1-EntrezGene distribution.

Index


NAME

Top

Bio::ASN1::EntrezGene::Indexer - Indexes NCBI Entrez Gene files.

SYNOPSIS

Top

  use Bio::ASN1::EntrezGene::Indexer;

  # creating & using the index is just a few lines
  my $inx = Bio::ASN1::EntrezGene::Indexer->new(
    -filename => 'entrezgene.idx',
    -write_flag => 'WRITE'); # needed for make_index call, but if opening 
                             # existing index file, don't set write flag!
  $inx->make_index('Homo_sapiens', 'Mus_musculus', 'Rattus_norvegicus');
  my $seq = $inx->fetch(10); # Bio::Seq obj for Entrez Gene #10
  # alternatively, if one prefers just a data structure instead of objects
  $seq = $inx->fetch_hash(10); # a hash produced by Bio::ASN1::EntrezGene
                            # that contains all data in the Entrez Gene record

  # note that in case you wonder, you can get the files 'Homo_sapiens'
  # from NCBI Entrez Gene ftp download, DATA/ASN/Mammalia directory

PREREQUISITE

Top

Bio::ASN1::EntrezGene, Bioperl version that contains Stefan Kirov's entrezgene.pm and all dependencies therein.

INSTALLATION

Top

Same as Bio::ASN1::EntrezGene

DESCRIPTION

Top

Bio::ASN1::EntrezGene::Indexer is a Perl Indexer for NCBI Entrez Gene genome databases. It processes an ASN.1-formatted Entrez Gene record and stores the file position for each record in a way compliant with Bioperl standard (in fact its a subclass of Bioperl's index objects).

Note that this module does not parse record, because it needs to run fast and grab only the gene ids. For parsing record, use Bio::ASN1::EntrezGene, or better yet, use Bio::SeqIO, format 'entrezgene'.

It takes this module (version 1.07) 21 seconds to index the human genome Entrez Gene file (Apr. 5/2005 download) on one 2.4 GHz Intel Xeon processor.

SEE ALSO

Top

For details on various parsers I generated for Entrez Gene, example scripts that uses/benchmarks the modules, please see http://sourceforge.net/projects/egparser/. Those other parsers etc. are included in V1.05 download.

AUTHOR

Top

Dr. Mingyi Liu <mingyi.liu@gpc-biotech.com>

COPYRIGHT

Top

CITATION

Top

Liu, M and Grigoriev, A (2005) "Fast Parsers for Entrez Gene" Bioinformatics. In press

OPERATION SYSTEMS SUPPORTED

Top

Any OS that Perl & Bioperl run on.

METHODS

Top

fetch

  Parameters: $geneid - id for the Entrez Gene record to be retrieved
  Example:    my $hash = $indexer->fetch(10); # get Entrez Gene #10
  Function:   fetch the data for the given Entrez Gene id.
  Returns:    A Bio::Seq object produced by Bio::SeqIO::entrezgene
  Notes:      One needs to have Bio::SeqIO::entrezgene installed before 
                calling this function!

fetch_hash

  Parameters: $geneid - id for the Entrez Gene record to be retrieved
  Example:    my $hash = $indexer->fetch_hash(10); # get Entrez Gene #10
  Function:   fetch a hash produced by Bio::ASN1::EntrezGene for given Entrez
                Gene id.
  Returns:    A data structure containing all data items from the Entrez
                Gene record.
  Notes:      Alternative to fetch()

_file_handle

  Title   : _file_handle
  Usage   : $fh = $index->_file_handle( INT )
  Function: Returns an open filehandle for the file
            index INT.  On opening a new filehandle it
            caches it in the @{$index->_filehandle} array.
            If the requested filehandle is already open,
            it simply returns it from the array.
  Example : $fist_file_indexed = $index->_file_handle( 0 );
  Returns : ref to a filehandle
  Args    : INT
  Notes   : This function is copied from Bio::Index::Abstract. Once that module
              changes file handle code like I do below to fit perl 5.005_03, this
              sub would be removed from this module


Bio-ASN1-EntrezGene documentation  | view source Contained in the Bio-ASN1-EntrezGene distribution.