| Bio-Phylo documentation | Contained in the Bio-Phylo distribution. |
Bio::Phylo::NeXML::DOM - XML DOM support for Bio::Phylo
use Bio::Phylo::NeXML::DOM; use Bio::Phylo::IO qw( parse ); Bio::Phylo::NeXML::DOM->new(-format => 'twig'); my $project = parse( -file=>'my.nex', -format=>'nexus' ); my $nex_twig = $project->doc();
This module adds to_dom methods to Bio::Phylo::NeXML::Writable
classes, which provide NeXML-valid objects for document object model
manipulation. DOM formats currently available are XML::Twig and
XML::LibXML. For any XMLWritable object, use to_dom in place
of to_xml to create DOM nodes.
The doc() method is also added to the Bio::Phylo::Project class. It
returns a NeXML document as a DOM object populated by the current contents
of the Bio::Phylo::Project object.
The NeXML parsing/writing capability of Bio::Phylo goes a long way
towards wider adoption of this useful standard.
However, while Bio::Phylo can write NeXML-valid XML, the way in
which it does this natively is somewhat hard-coded and therefore
restricted, and is essentially oriented toward text file output. As
such, there is a mismatch between the sophisticated Bio::Phylo data
structure and its own ability to manipulate and serialize that
structure in sophisticated but interoperable ways. Finer manipulations
of XML-represented data are possible via through a variety of Perl
packages that can store and control XML according to a document
object model (DOM). Many of these packages allow extremely flexible
computation over large datasets stored in XML format, and admit the
use of XML-related facilities such as XPath and XSLT programmatically.
The purpose of Bio::Phylo::NeXML::DOM is to introduce integrated DOM
object creation and manipulation to Bio::Phylo, both to make DOM
computation in Bio::Phylo more convenient, and also to provide a
platform for potentially more sophisticated Bio::Phylo modules to
come.
Besides the notion that DOM capability should be optional for the user,
there are two main design ideas. First, for each Bio::Phylo object
that can be parsed/written as NeXML (i.e., for each
Bio::Phylo::NeXML::Writable object), we provide analogous method
for creating a representative DOM object, or element. These elements
are aggregatable in a DOM document object, whose native stringifying
method can be used to generate valid NeXML.
Second, we allow flexibility and extensibility in the choice of the
underlying DOM package, while maintaining a consistent DOM interface
that is similar in semantic and syntactic style to the accessors and
mutators that act on the Bio::Phylo objects themselves. This is
achieved through the DOM::DocumentI and DOM::ElementI interfaces,
which define a minimal subset of DOM accessors and mutators, their
inputs and outputs. Concrete instances of these interface classes
provide the bindings between the abstract methods and their
counterparts in the desired DOM implementation. Currently, there are
bindings for two popular packages, XML::Twig and XML::LibXML.
Another priority was simplicity of use; most of the details remain
under the hood in practice. The Bio/Phylo/Util/DOM.pm file defines the
to_dom() method for each XMLWritable package, as well as the
Bio::Phylo::NeXML::DOM package proper. The DOM object is a
factory that is used to create Element and Document objects; it is an
inside-out object that subclasses Bio::Phylo. To curb the
proliferation of method arguments, a DOM factory instance (set by the
latest invocation of Bio::Phylo::NeXML::DOM->new()) is maintained in
a package global. This is used by default for object creation with DOM
methods if a DOM factory object is not explicitly provided in the
argument list.
The underlying DOM implementation is set with the DOM factory
constructor's single argument, -format. Even this can be left out;
the default implementation is XML::Twig, which is already required
by Bio::Phylo. Thus, for example, one can use the DOM to convert
a Nexus file to a DOM representation as follows:
use Bio::Phylo::NeXML::DOM; use Bio::Phylo::IO qw( parse ); Bio::Phylo::NeXML::DOM->new(); my $project = parse( -file=>'my.nex', -format=>'nexus' ); my $nex_twig = $project->doc(); # The end.
Underlying DOM packages are loaded at runtime as specified by the
-format argument. Packages for unused formats do not need to be
installed.
The minimal DOM interface specifies the following methods. Details can be
obtained from the Element and Document POD.
get_tagname() set_tagname() get_attributes() set_attributes() clear_attributes() get_text() set_text() clear_text() get_parent() get_children() get_first_child() get_last_child() get_next_sibling() get_prev_sibling() get_elements_by_tagname() set_child() prune_child() to_xml_string()
get_encoding() set_encoding() get_root() set_root() get_element_by_id() get_elements_by_tagname() to_xml_string() to_xml_file()
Type : Constructor Title : new Usage : $dom = Bio::Phylo::NeXML::DOM->new(-format=>$format) Function: Create a new DOM factory Returns : DOM object Args : optional: -format => DOM format (defaults to 'twig')
Type : Factory method
Title : create_element
Usage : $elt = $dom->create_element()
Function: Create a new XML DOM element
Returns : DOM element
Args : Optional:
-tag => $tag_name
-attr => \%attr_hash
Type : Factory method Title : parse_element Usage : $elt = $dom->parse_element($text) Function: Create a new XML DOM element from XML text Returns : DOM element Args : An XML String
Type : Creator Title : create_document Usage : $doc = $dom->create_document() Function: Create a new XML DOM document Returns : DOM document Args : Package-specific args
Type : Factory method Title : parse_document Usage : $doc = $dom->parse_document($text) Function: Create a new XML DOM document from XML text Returns : DOM document Args : An XML String
Type : Mutator Title : set_format Usage : $dom->set_format($format) Function: Set the format (underlying DOM package bindings) for this object Returns : format designator as string Args : format designator as string
Type : Accessor Title : get_format Usage : $dom->get_format() Function: Get the format designator for this object Returns : format designator as string Args : none
Type : Static accessor Title : get_dom Usage : __PACKAGE__->get_dom() Function: Get the singleton DOM object Returns : instance of this __PACKAGE__ Args : none
The DOM creator abstract classes: Bio::Phylo::NeXML::DOM::Element, Bio::Phylo::NeXML::DOM::Document
If you use Bio::Phylo in published research, please cite it:
Rutger A Vos, Jason Caravas, Klaas Hartmann, Mark A Jensen and Chase Miller, 2011. Bio::Phylo - phyloinformatic analysis using Perl. BMC Bioinformatics 12:63. http://dx.doi.org/10.1186/1471-2105-12-63
Mark A. Jensen (maj -at- fortinbras -dot- us), refactored by Rutger Vos
The Bio::Phylo::Annotation class is not yet DOMized.
| Bio-Phylo documentation | Contained in the Bio-Phylo distribution. |
# $Id: DOM.pm 1660 2011-04-02 18:29:40Z rvos $ package Bio::Phylo::NeXML::DOM; use strict; use base 'Bio::Phylo'; use Bio::Phylo::Util::CONSTANT qw'_DOMCREATOR_ looks_like_class'; use Bio::Phylo::Util::Exceptions 'throw'; use Bio::Phylo::Factory; use File::Spec::Unix; # store DOM factory object as a global here, to avoid proliferation of # function arguments our $DOM; { my $CONSTANT_TYPE = _DOMCREATOR_; my (%format); my $fac = Bio::Phylo::Factory->new;
sub new {
my $self = shift->SUPER::new( '-format' => 'twig', @_ );
return $DOM = $self;
}
sub create_element {
if ( my $format = shift->get_format ) {
return $fac->create_element( '-format' => $format, @_ );
}
else {
throw 'BadArgs' => 'DOM creator format not set';
}
}
sub parse_element {
if ( my $f = shift->get_format ) {
return looks_like_class( __PACKAGE__ . '::Element::' . $f )
->parse_element(shift);
}
else {
throw 'BadArgs' => 'DOM creator format not set';
}
}
sub create_document {
if ( my $format = shift->get_format ) {
return $fac->create_document( '-format' => $format, @_ );
}
else {
throw 'BadArgs' => 'DOM creator format not set';
}
}
sub parse_document {
if ( my $format = shift->get_format ) {
my $implementation = __PACKAGE__ . '::' . $format;
return $implementation->parse_document(shift);
}
else {
throw 'BadArgs' => 'DOM creator format not set';
}
}
sub set_format {
my $self = shift;
$format{ $self->get_id } = shift;
return $self;
}
sub get_format {
my $self = shift;
return ucfirst( lc( $format{ $self->get_id } ) );
}
sub get_dom { $DOM ||= __PACKAGE__->new }
sub _type { $CONSTANT_TYPE }
sub _cleanup {
my $self = shift;
delete $format{ $self->get_id };
}
} 1;