Phylogenetic trees help you reason about evolutionary relationships instead of looking at sequences in isolation. If you work with comparative genomics, proteins, or species-level data, tree workflows are a core bioinformatics skill. In this tutorial, you will use `Bio.Phylo` to build, read, write, and visualize phylogenetic trees. ## Building Phylogenetic Trees
import requests
from Bio import Phylo
# Download all files used in this tutorial once
urls = {
"opuntia.dnd": "https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/opuntia.dnd",
"hedgehog.aln": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/Clustalw/hedgehog.aln",
"apaf.xml": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/PhyloXML/apaf.xml",
}
for filename, url in urls.items():
response = requests.get(url, timeout=30)
response.raise_for_status()
with open(filename, "w", encoding="utf-8") as f:
f.write(response.text)
# Load a Newick tree file and inspect basic properties
tree = Phylo.read("opuntia.dnd", "newick")
print("Rooted:", tree.rooted)
print("Number of terminal clades:", len(tree.get_terminals()))
print("Terminal names:", [clade.name for clade in tree.get_terminals()])Rooted: False Number of terminal clades: 7 Terminal names: ['gi|6273291|gb|AF191665.1|AF191665', 'gi|6273290|gb|AF191664.1|AF191664', 'gi|6273289|gb|AF191663.1|AF191663', 'gi|6273287|gb|AF191661.1|AF191661', 'gi|6273286|gb|AF191660.1|AF191660', 'gi|6273285|gb|AF191659.1|AF191659', 'gi|6273284|gb|AF191658.1|AF191658']
This block downloads the tree/alignment files used throughout the tutorial and reads a Newick tree with `Phylo.read`. Starting with a parsed tree object gives you a foundation for traversal, analysis, and export in later sections. ## Building a Phylogenetic Tree from Protein Sequences
from Bio import AlignIO
from Bio.Phylo.TreeConstruction import DistanceCalculator, DistanceTreeConstructor
# Read a protein multiple sequence alignment in Clustal format
alignment = AlignIO.read("hedgehog.aln", "clustal")
# Compute pairwise distances using a protein substitution model
calculator = DistanceCalculator("blosum62")
distance_matrix = calculator.get_distance(alignment)
# Build a Neighbor-Joining tree from the distance matrix
constructor = DistanceTreeConstructor()
protein_tree = constructor.nj(distance_matrix)
print("Terminal clades:", len(protein_tree.get_terminals()))
print("First five terminal names:", [c.name for c in protein_tree.get_terminals()[:5]])Terminal clades: 5 First five terminal names: ['gi|13990994|dbj|BAA33523.2|', 'gi|167877390|gb|EDS40773.1|', 'gi|167234445|ref|NP_001107837.', 'gi|74100009|gb|AAZ99217.1|', 'gi|56122354|gb|AAV74328.1|']
Here you convert a protein alignment into a distance matrix and then into a tree with Neighbor-Joining. This is a practical workflow when you already have aligned protein sequences and need an interpretable evolutionary topology quickly. ## Reading and Writing Phylo Trees
from Bio import Phylo
# Read Newick and PhyloXML trees from local files
newick_tree = Phylo.read("opuntia.dnd", "newick")
phyloxml_tree = Phylo.read("apaf.xml", "phyloxml")
# Write trees to different output formats
Phylo.write(newick_tree, "opuntia_copy.xml", "phyloxml")
Phylo.write(phyloxml_tree, "apaf_copy.nwk", "newick")
print("Wrote opuntia_copy.xml and apaf_copy.nwk")Wrote opuntia_copy.xml and apaf_copy.nwk
This block demonstrates format conversion across common tree standards. Converting between Newick and PhyloXML is useful when tools in your pipeline expect different file formats. ## Rooting and Re-rooting Trees
from Bio import Phylo
# Load an unrooted/partially rooted tree
tree = Phylo.read("opuntia.dnd", "newick")
# Midpoint root is a practical default when no outgroup is available
tree.root_at_midpoint()
print("Rooted after midpoint rooting:", tree.rooted)
# Re-root using an explicit outgroup terminal if present
terminals = tree.get_terminals()
if terminals:
tree.root_with_outgroup(terminals[0])
print("Re-rooted with outgroup:", terminals[0].name)Rooted after midpoint rooting: True Re-rooted with outgroup: gi|6273291|gb|AF191665.1|AF191665
This section shows two common rooting workflows: midpoint rooting for exploratory analysis and explicit outgroup rooting for biologically guided trees. Correct rooting is essential when you interpret ancestor-descendant direction. ## Pruning to Taxa of Interest
from Bio import Phylo
# Load tree and get terminal labels
tree = Phylo.read("opuntia.dnd", "newick")
terminals = [clade.name for clade in tree.get_terminals()]
# Keep only a small panel of taxa by pruning the rest
keep = set(terminals[:4])
for clade in list(tree.get_terminals()):
if clade.name not in keep:
tree.prune(clade)
print("Remaining terminals after pruning:", [c.name for c in tree.get_terminals()])Remaining terminals after pruning: ['gi|6273291|gb|AF191665.1|AF191665', 'gi|6273290|gb|AF191664.1|AF191664', 'gi|6273289|gb|AF191663.1|AF191663', 'gi|6273287|gb|AF191661.1|AF191661']
Pruning lets you focus on a biologically relevant subset, such as one genus or one set of samples from a larger tree. This is useful for cleaner figures and targeted interpretation. ## Extracting and Exporting a Subtree
from Bio import Phylo
from copy import deepcopy
# Read original tree and find an internal clade
tree = Phylo.read("opuntia.dnd", "newick")
internal_clades = [c for c in tree.find_clades() if not c.is_terminal()]
if not internal_clades:
raise ValueError("No internal clades found for subtree extraction.")
target_clade = internal_clades[0]
subtree = deepcopy(target_clade)
# Wrap clade in a Tree object and export
subtree_tree = Phylo.BaseTree.Tree(root=subtree)
Phylo.write(subtree_tree, "opuntia_subtree.nwk", "newick")
print("Subtree terminals:", [c.name for c in subtree_tree.get_terminals()])
print("Wrote subtree to opuntia_subtree.nwk")Subtree terminals: ['gi|6273291|gb|AF191665.1|AF191665', 'gi|6273290|gb|AF191664.1|AF191664', 'gi|6273289|gb|AF191663.1|AF191663', 'gi|6273287|gb|AF191661.1|AF191661', 'gi|6273286|gb|AF191660.1|AF191660', 'gi|6273285|gb|AF191659.1|AF191659', 'gi|6273284|gb|AF191658.1|AF191658'] Wrote subtree to opuntia_subtree.nwk
Subtree export is practical when you need to share one branch with collaborators or run downstream analyses on one lineage only. ## Visualizing Phylogenetic Trees
import matplotlib.pyplot as plt
from Bio import Phylo
# Read the tree object
visual_tree = Phylo.read("opuntia.dnd", "newick")
# ASCII view is useful in terminal-only environments
Phylo.draw_ascii(visual_tree)
# Matplotlib rendering gives publication-style visualization
plt.figure(figsize=(10, 6))
Phylo.draw(visual_tree, do_show=False)
plt.title("Opuntia Phylogenetic Tree")
plt.tight_layout()
plt.show()
# Save static files for reports and papers
plt.figure(figsize=(10, 6))
Phylo.draw(visual_tree, do_show=False)
plt.title("Opuntia Phylogenetic Tree")
plt.tight_layout()
plt.savefig("opuntia_tree.png", dpi=300)
plt.savefig("opuntia_tree.pdf")
print("Saved opuntia_tree.png and opuntia_tree.pdf") _______________ gi|6273291|gb|AF191665.1|AF191665
__________________________|
| | ______ gi|6273290|gb|AF191664.1|AF191664
| |__|
| |_____ gi|6273289|gb|AF191663.1|AF191663
|
_|_________________ gi|6273287|gb|AF191661.1|AF191661
|
|__________ gi|6273286|gb|AF191660.1|AF191660
|
| __ gi|6273285|gb|AF191659.1|AF191659
|___|
| gi|6273284|gb|AF191658.1|AF191658
<Figure size 1000x600 with 0 Axes>
Saved opuntia_tree.png and opuntia_tree.pdf
<Figure size 1000x600 with 0 Axes>
You get two complementary visual outputs: quick ASCII inspection for debugging and a full plotted tree for reports and presentations. Visualization is often the fastest way to detect unusual branch structure before deeper analysis.