#

Cladescan

Introduction

Welcome to Cladescan, a simple program for automated phylogenetic sensitiviy analysis and tree comparison.

Cladescan was written to automate the process of comparing trees. Looking for a particular node in a tree seems like a trivial task; and it is, until you try to look for a complicated node among hundreds of trees. With large numbers of taxa in an analysis, this can become a frequent and tedious occurrence. Worse, it's a process prone to human error. These problems become especially significant when performing sensitivity analyses (e.g., examining best trees from a number of condition sets for presence of a clade of interest).

This program seeks to make your life easier and more accurate by doing these comparisons for you, summarizing the results both in textual and graphical formats. "Navajo Rug" sensitivity plots sensu Giribet (Systems Biology 52, 2003) for each target clade may be output in Scalable Vector Graphics format, suitable for import into vector graphics packages such as Adobe Illustrator.

To run Cladescan, you need only a Unix-based operating system with Perl 5.8.x or better, a folder with write permissions, and tree files in simple parenthetical format.

If you have any questions, problems, or suggestions for improvements, please feel free to contact the author at jsanders [at] oeb [dot] harvard [dot] edu.

Why should I use Cladescan?

I have found Cladescan useful for several problems:

  • Quickly examining very large trees for clades of interest. Even simple questions, like "is group X monophyletic in this analysis," can be tedius with very large numbers of taxa.
  • Quickly comparing large trees. Cladescan can take one parenthetical tree as input and look for each node of the input tree in one or more others. The graphical output can then be quickly scanned to look for nodes that differ among the trees.
  • Sensitivity analyses. A phylogenetic sensitivity analysis involves searching through several tree files, each of which may contain a number of trees, for the presence of target clades. With large numbers of taxa and parsimony analyses, the number of trees to search can quickly become overwhelming. Cladescan automatically searches among trees for an arbitrary number of target clades, producing publication-quality vector graphics which can be used to annotate trees later on.

Please let me know about any other useful applications you discover, and I'll post them here.

How it works

Cladescan uses a recursive subroutine to search along all the branches in a tree, stopping when it detects the target clade or exhausts all branches of the tree.

When analyzing a tree for presence of a specific target clade, Cladescan calls the subroutine, passing it both the parenthetical tree and the clade of interest. The subroutine first makes a list of every terminal in the parenthetical tree and compares it to the list of terminals in the target clade (note: topology of the target clade is not important to the match, only that it is a monophyletic clade with the exact same terminals). If it's a match, the subroutine returns a "win" and exits.

If not, the subroutine checks to see if the tree is bigger than the target clade -- i.e., if the target clade could possibly be a subset of the tree itself. If the tree is smaller than the target clade, the subroutine returns a "fail" and exits.

If the tree might still contain the target clade, the subroutine divides the tree into its first-degree child nodes. Each of these first-degree nodes then gets sent to a child instance of the subroutine and checked for presence of the target clade.

Thus, while the program is scanning a tree, one instance of the subroutine sits at each node of the tree, listening for child subroutines further out the branches of that node to report back a "win" or a "fail." As soon as one subroutine reports success, the message is passed up the hierarchy and the program moves on to the next tree. If all the child subroutines report failure, the tree does not contain the target clade, and the program moves on, as well.

Installation and use

Installation is simple: just download and unzip the compressed folder to a directory in which you have write permissions.

Flowchart depicting Cladescan operation

Use: See Figure 1 above for a visual example.

To run, you must create an input file with instructions for the program, such as where to find the tree files, what to do with them, and where to write the output. Detailed instructions can be found in the sample input file included with the program.

Additionally, you must ensure that the input tree files are in the correct format. Currently, input trees must be in simple parenthetical format, without branch lengths or other extraneous information, and with commas separating the branches at each node. Each tree in a file should end with a semicolon, and the file should be Unix-encoded (not Mac or Windows line breaks).

To run the program, simply open a terminal, navigate to the directory in which it is located, and call it up by typing ./cladescan and hitting return. The program will prompt you to enter a path to your input file; press return again, and you're on your way.

Alternatively, you may pass the program the input file path while calling it by typing ./cladescan input.txt where input.txt is the path to your input file.

Download

Click here to download Cladescan V1.0 (36kB .zip archive)

Version History

Version 1.0 First publicly distributed version.

Citing Cladescan

If you would like to cite this program, here's the reference:

Sanders, Jon G. in press. Cladescan, a program for automated phylogenetic sensitivity analysis. Cladistics.

Copyright

COPYRIGHT 2008 by JON G SANDERS

Distributed under the GNU General Public License v3 (GPLv3). Feel free to share, distribute, modify, and improve it, but please assign proper attribution.

CC BY-NC 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.