Multiple Uris for complex types Taxonomy, Sequence, and Annotation

It has been suggested that complex types Taxonomy, Sequence, and Annotation are allowed to have multiple Uris.

Currently (phyloXML version 1.10), these elements only allow one Uri each.

Order of elements in phyloXML

Since the validity of phyloXML documents is enforced by a XSD Schema, the order of elements matters (for more information and discussions, see http://www.w3schools.com/Schema/schema_complex_indicators.asp).

The current (as of 9 September 2010) BioPerl implementation of the phyloXML format unfortunately produces output with incorrect element order. For Archaeopteryx users a temporary “solution” is to turn off XSD-based validation, with the following line in the Archaeopteryx¬† configuration file:

validate_against_phyloxml_xsd_schema: false

Examples of proper order of sub-elements

For <clade> the order of sub-elements is:

  1. <name>
  2. <branch_length>
  3. <confidence>
  4. <width>
  5. <color>
  6. <taxonomy>
  7. <sequence>
  8. <events>
  9. <binary_characters>
  10. <distribution>
  11. <date>
  12. <reference>
  13. <property>
  14. <clade>

For <sequence>, the order is:

  1. <symbol>
  2. <accession>
  3. <name>
  4. <location>
  5. <mol_seq>
  6. <uri>
  7. <annotation>
  8. <domain_architecture>

For <taxonomy>, the order is:

  1. <id>
  2. <code>
  3. <scientific_name>
  4. <authority>
  5. <common_name>
  6. <synonym>
  7. <rank>
  8. <uri>

Needless to say, not all sub-elements have to appear, but if they do, they have to appear in proper order.

How to add (typed) support values to a given tree

confadd is a simple command line tool to calculate typed confidence values for a given evolutionary tree.
Its input is typically one evolutionary tree which might or might not already have confidence values associated with its branches (the ‘target’) and a set (in the range of hundreds or more) of evolutionary trees (the ‘evaluators’) in which the frequency of splits represent confidences for the ‘target’. The ‘evaluators’ are typically the result of a bootstrap re-sampling analysis or of a Bayesian method.

See: http://www.phylosoft.org/forester/applications/confadd/

phyloXML reference published

phyloXML reference published:
Han M.V. and Zmasek C.M.
“phyloXML: XML for evolutionary biology and comparative genomics”
BMC Bioinformatics 2009, 10:356

Proposed changes and additions for phyloXML version 1.10

Changes and additions for phyloXML version 1.10

Originally published on Tuesday, October 27, 2009.

Changes

Sequence

Type ‘aa’ is changed to ‘protein’.

Id

Elements Sequence, Clade, and Phylogeny have an ‘id’ sub-element.
Currently, ‘id’ has a ‘type’ attribute, which is used to indicate the source, or database of the identifier.

‘type’ turned out to be an ill chosen name.
Hence, we rename ‘type’ into¬† ‘provider’.

Date

Remove ‘range’ attribute, and replace it with ‘minimum’ and ‘maximum’ elements.

Taxonomy

Remove unnecessary ‘type’ attribute (‘type’ is part of ‘id’ and will be renamed there).

Additions

Sequence

Add ‘is_aligned’ attribute to ‘mol_seq’ sub-element.
This is used to indicated that the molecular sequence described by the mol_seq sub-element is aligned with all other sequences in the same tree for which ‘is aligned’ is true (which, in most cases, means that gaps were introduced, and that all sequences have the same length).

Taxonomy

Add elements for:
‘authority’
‘synonym’ (list)

Point

Add ‘alt_unit’ attribute. This is the unit for the altitude.