Working with Phylogenetic Trees with Biopython

Phylogenetic trees help you reason about evolutionary relationships instead of looking at sequences in isolation. If you work with comparative genomics, proteins, or species-level data, tree workflows are a core bioinformatics skill.

In this tutorial, you will use `Bio.Phylo` to build, read, write, and visualize phylogenetic trees.

## Building Phylogenetic Trees

import requests
from Bio import Phylo

# Download all files used in this tutorial once
urls = {
    "opuntia.dnd": "https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/opuntia.dnd",
    "hedgehog.aln": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/Clustalw/hedgehog.aln",
    "apaf.xml": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/PhyloXML/apaf.xml",
}

for filename, url in urls.items():
    response = requests.get(url, timeout=30)
    response.raise_for_status()
    with open(filename, "w", encoding="utf-8") as f:
        f.write(response.text)

# Load a Newick tree file and inspect basic properties
tree = Phylo.read("opuntia.dnd", "newick")
print("Rooted:", tree.rooted)
print("Number of terminal clades:", len(tree.get_terminals()))
print("Terminal names:", [clade.name for clade in tree.get_terminals()])
Rooted: False
Number of terminal clades: 7
Terminal names: ['gi|6273291|gb|AF191665.1|AF191665', 'gi|6273290|gb|AF191664.1|AF191664', 'gi|6273289|gb|AF191663.1|AF191663', 'gi|6273287|gb|AF191661.1|AF191661', 'gi|6273286|gb|AF191660.1|AF191660', 'gi|6273285|gb|AF191659.1|AF191659', 'gi|6273284|gb|AF191658.1|AF191658']
This block downloads the tree/alignment files used throughout the tutorial and reads a Newick tree with `Phylo.read`. Starting with a parsed tree object gives you a foundation for traversal, analysis, and export in later sections.

## Building a Phylogenetic Tree from Protein Sequences

from Bio import AlignIO
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor

# Read a protein multiple sequence alignment in Clustal format
alignment = AlignIO.read("hedgehog.aln", "clustal")

# Compute pairwise distances using a protein substitution model
calculator = DistanceCalculator("blosum62")
distance_matrix = calculator.get_distance(alignment)

# Build a Neighbor-Joining tree from the distance matrix
constructor = DistanceTreeConstructor()
protein_tree = constructor.nj(distance_matrix)

print("Terminal clades:", len(protein_tree.get_terminals()))
print("First five terminal names:", [c.name for c in protein_tree.get_terminals()[:5]])
Terminal clades: 5
First five terminal names: ['gi|13990994|dbj|BAA33523.2|', 'gi|167877390|gb|EDS40773.1|', 'gi|167234445|ref|NP_001107837.', 'gi|74100009|gb|AAZ99217.1|', 'gi|56122354|gb|AAV74328.1|']
Here you convert a protein alignment into a distance matrix and then into a tree with Neighbor-Joining. This is a practical workflow when you already have aligned protein sequences and need an interpretable evolutionary topology quickly.

## Reading and Writing Phylo Trees

from Bio import Phylo

# Read Newick and PhyloXML trees from local files
newick_tree = Phylo.read("opuntia.dnd", "newick")
phyloxml_tree = Phylo.read("apaf.xml", "phyloxml")

# Write trees to different output formats
Phylo.write(newick_tree, "opuntia_copy.xml", "phyloxml")
Phylo.write(phyloxml_tree, "apaf_copy.nwk", "newick")

print("Wrote opuntia_copy.xml and apaf_copy.nwk")
Wrote opuntia_copy.xml and apaf_copy.nwk
This block demonstrates format conversion across common tree standards. Converting between Newick and PhyloXML is useful when tools in your pipeline expect different file formats.

## Rooting and Re-rooting Trees

from Bio import Phylo

# Load an unrooted/partially rooted tree
tree = Phylo.read("opuntia.dnd", "newick")

# Midpoint root is a practical default when no outgroup is available
tree.root_at_midpoint()
print("Rooted after midpoint rooting:", tree.rooted)

# Re-root using an explicit outgroup terminal if present
terminals = tree.get_terminals()
if terminals:
    tree.root_with_outgroup(terminals[0])
    print("Re-rooted with outgroup:", terminals[0].name)
Rooted after midpoint rooting: True
Re-rooted with outgroup: gi|6273291|gb|AF191665.1|AF191665
This section shows two common rooting workflows: midpoint rooting for exploratory analysis and explicit outgroup rooting for biologically guided trees. Correct rooting is essential when you interpret ancestor-descendant direction.

## Pruning to Taxa of Interest

from Bio import Phylo

# Load tree and get terminal labels
tree = Phylo.read("opuntia.dnd", "newick")
terminals = [clade.name for clade in tree.get_terminals()]

# Keep only a small panel of taxa by pruning the rest
keep = set(terminals[:4])
for clade in list(tree.get_terminals()):
    if clade.name not in keep:
        tree.prune(clade)

print("Remaining terminals after pruning:", [c.name for c in tree.get_terminals()])
Remaining terminals after pruning: ['gi|6273291|gb|AF191665.1|AF191665', 'gi|6273290|gb|AF191664.1|AF191664', 'gi|6273289|gb|AF191663.1|AF191663', 'gi|6273287|gb|AF191661.1|AF191661']
Pruning lets you focus on a biologically relevant subset, such as one genus or one set of samples from a larger tree. This is useful for cleaner figures and targeted interpretation.

## Extracting and Exporting a Subtree

from Bio import Phylo
from copy import deepcopy

# Read original tree and find an internal clade
tree = Phylo.read("opuntia.dnd", "newick")
internal_clades = [c for c in tree.find_clades() if not c.is_terminal()]

if not internal_clades:
    raise ValueError("No internal clades found for subtree extraction.")

target_clade = internal_clades[0]
subtree = deepcopy(target_clade)

# Wrap clade in a Tree object and export
subtree_tree = Phylo.BaseTree.Tree(root=subtree)
Phylo.write(subtree_tree, "opuntia_subtree.nwk", "newick")
print("Subtree terminals:", [c.name for c in subtree_tree.get_terminals()])
print("Wrote subtree to opuntia_subtree.nwk")
Subtree terminals: ['gi|6273291|gb|AF191665.1|AF191665', 'gi|6273290|gb|AF191664.1|AF191664', 'gi|6273289|gb|AF191663.1|AF191663', 'gi|6273287|gb|AF191661.1|AF191661', 'gi|6273286|gb|AF191660.1|AF191660', 'gi|6273285|gb|AF191659.1|AF191659', 'gi|6273284|gb|AF191658.1|AF191658']
Wrote subtree to opuntia_subtree.nwk
Subtree export is practical when you need to share one branch with collaborators or run downstream analyses on one lineage only.

## Visualizing Phylogenetic Trees

import matplotlib.pyplot as plt
from Bio import Phylo

# Read the tree object
visual_tree = Phylo.read("opuntia.dnd", "newick")

# ASCII view is useful in terminal-only environments
Phylo.draw_ascii(visual_tree)

# Matplotlib rendering gives publication-style visualization
plt.figure(figsize=(10, 6))
Phylo.draw(visual_tree, do_show=False)
plt.title("Opuntia Phylogenetic Tree")
plt.tight_layout()
plt.show()

# Save static files for reports and papers
plt.figure(figsize=(10, 6))
Phylo.draw(visual_tree, do_show=False)
plt.title("Opuntia Phylogenetic Tree")
plt.tight_layout()
plt.savefig("opuntia_tree.png", dpi=300)
plt.savefig("opuntia_tree.pdf")
print("Saved opuntia_tree.png and opuntia_tree.pdf")
                             _______________ gi|6273291|gb|AF191665.1|AF191665
  __________________________|
 |                          |   ______ gi|6273290|gb|AF191664.1|AF191664
 |                          |__|
 |                             |_____ gi|6273289|gb|AF191663.1|AF191663
 |
_|_________________ gi|6273287|gb|AF191661.1|AF191661
 |
 |__________ gi|6273286|gb|AF191660.1|AF191660
 |
 |    __ gi|6273285|gb|AF191659.1|AF191659
 |___|
     | gi|6273284|gb|AF191658.1|AF191658

<Figure size 1000x600 with 0 Axes>
Saved opuntia_tree.png and opuntia_tree.pdf
<Figure size 1000x600 with 0 Axes>
You get two complementary visual outputs: quick ASCII inspection for debugging and a full plotted tree for reports and presentations. Visualization is often the fastest way to detect unusual branch structure before deeper analysis.