Homologous gene comparisons are a core task in comparative genomics. A practical workflow usually starts with downloading related sequences, checking quality, and aligning pairs before deeper analyses like tree building. In this tutorial, you will download homologous nucleotide sequences and align them with Biopython. ## Downloading Homologous Gene Sequences
import requests
from Bio import SeqIO
# Download three homologous nucleotide sequences from related plant examples
urls = {
"lupine.nu": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/Fasta/lupine.nu",
"lavender.nu": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/Fasta/lavender.nu",
"phlox.nu": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/Fasta/phlox.nu",
}
records = []
for filename, url in urls.items():
response = requests.get(url, timeout=30)
response.raise_for_status()
with open(filename, "w", encoding="utf-8") as f:
f.write(response.text)
records.append(SeqIO.read(filename, "fasta"))
print("Downloaded records:", [r.id for r in records])
print("Sequence lengths:", [len(r.seq) for r in records])Downloaded records: ['gi|5049839|gb|AI730987.1|AI730987', 'gi|5690369|gb|AF158246.1|AF158246', 'gi|5052071|gb|AF067555.1|AF067555'] Sequence lengths: [655, 550, 623]
This block downloads and stores homologous sequences locally so later alignment steps can be rerun without additional network calls. ## Pairwise Global Alignment of Homologs
from Bio import SeqIO
from Bio.Align import PairwiseAligner
# Read local FASTA records
lupine = SeqIO.read("lupine.nu", "fasta")
lavender = SeqIO.read("lavender.nu", "fasta")
# Configure a global aligner for homolog comparison
aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 2
aligner.mismatch_score = -1
aligner.open_gap_score = -3
aligner.extend_gap_score = -1
score = aligner.score(str(lupine.seq), str(lavender.seq))
alignment = aligner.align(str(lupine.seq), str(lavender.seq))[0]
print("Alignment score:", score)
print(alignment)Alignment score: 185.0
target 0 GAAAATTCATTTTCTTTGGACTTTCTCTGAAATCCGAGTCCTAGGAAAG-ATGCGTGAGA
0 -----------------||-----||||.||.|-|..||-|||||.|.|-.|||...||.
query 0 -----------------GG-----CTCTTAAGT-CATGT-CTAGGCAGGTGTGCACAAGT
target 59 TT-CTTCATATTCAGGCAGGGCA-GTGCGGCAACCAGATCGGTGCCAATTTTTGGGAGGT
60 ||-..||.|.|||.|.||--|||-.||.||-----||||.||-|-------||||||..|
query 36 TTAGGTCTTGTTCCGCCA--GCATCTGAGG-----AGATTGG-G-------TTGGGAACT
target 117 GGTTTGC--GCCGAACA-CGGCATAAATTC---AACGGGA---CGCTATCAAGGGGACAA
120 |||.|||--|.|-|.||-|.|..|..|..|---|.|||||---|.||-|||...|--||.
query 81 GGTATGCATGTC-ACCATCTGAGTTTAGCCCAGATCGGGAGTCCTCT-TCATCAG--CAT
target 168 TGATTTGCAACTGGAGCGCGTTAACGTTTACTACAATGAAGCGAGT--TGC-GGGAGGTT
180 |||---|.||.|.|||...|..|..|.|--|-|.|||||.|.|.||--.||-.|||..|.
query 137 TGA---GAAAGTAGAGACTGGCACAGGT--C-ATAATGAGGTGGGTCAAGCATGGATATG
target 225 CGTTCCACGAGCCGTCCTTATGGACTTGG---AGCCTGGCACTATGGAT-AGTGTTAGAT
240 .|..||..|..|.|.|.|.|..||...||---||.||.|||-||.|.||-||..||...|
query 191 AGGCCCTGGTTCTGCCATCACCGAGACGGCAAAGGCTAGCA-TAGGTATGAGACTTCCTT
target 281 CCGGGCCGTAC-GGGCAGATTTTCAGGCCTGACAACTTTGTTTTCGGCCAGTCCGG--TG
300 ||.|.|.||.|-|.|.|-|.|...|-||||-|||--...||||..||||--|..||--||
query 250 CCTGCCAGTGCTGAGGA-ACTGGGA-GCCT-ACA--GAGGTTTGAGGCC--TTGGGTCTG
target 338 CAGGGAATAACTGGGCCAAGGGTCATTACACGGAAGGGGCCGAGTTGATCG--ACTCANT
360 |-..|||.||--.|||||--|.|------.|||.|---.|....||||.||--.||||-.
query 303 C-CAGAAGAA--AGGCCA--GCT------GCGGTA---ACATCTTTGACCGTCCCTCA-C
target 396 GCTAG-ATGTGGTG-AGGAAAGAAGCCGAAAACTGTGACTGCTTGCAAGGATTTCAAGTA
420 |-|||-.||...||-.||..||.|--.|||.|--||.|||..||-|..||||.|...||.
query 348 G-TAGTCTGCCTTGCTGGCCAGCA--TGAAGA--GTCACTAGTT-CCTGGATCTGCTGTC
target 454 TGCCATTCATTGGGAAGAGGAACTGGATCGGGCATGGGAACACTTTTGATTTCCAAGATC
480 ||--|---|.|||||||||--------|||||-.||||..|||---------||----||
query 402 TG--A---ACTGGGAAGAG--------TCGGG-TTGGGTCCAC---------CC----TC
target 514 CGAGAAGAATACCCAGACCGAATGATGCTCACTTTCTCTGTGTTC---CCTTCCCCTAAG
540 |-||.....||-||.|-..|||-|||--|..|||--||||.|..|---..|..|...|.|
query 435 C-AGGCTGTTA-CCTG-TGGAA-GAT--TGGCTT--TCTGGGGACAGGTGTGGCAGGAGG
target 571 GTGTCTGACACTGTTGTCGAACCTTACAATGCTACTCTCTCTGTTCACCANCTTGTTGAG
600 |.|...|..|-.|.|||-|--||-|.|..|.||.|..|..|..|||-----|||.|----
query 487 GAGGGGGGTA-GGGTGT-G--CC-TGCCCTCCTCCAGTGCCGTTTC-----CTTCT----
target 631 AATGCTGATGAATGTATGGTTTTT 655
660 -.|.||..||--.|||-||---|. 684
query 533 -CTCCTAGTG--GGTA-GG---TG 550
Pairwise global alignment is a practical first comparison when you want to quantify overall similarity between two homologs. ## Ranking Homolog Similarity Across Multiple Pairs
from Bio import SeqIO
from Bio.Align import PairwiseAligner
from itertools import combinations
records = [
SeqIO.read("lupine.nu", "fasta"),
SeqIO.read("lavender.nu", "fasta"),
SeqIO.read("phlox.nu", "fasta"),
]
aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 2
aligner.mismatch_score = -1
aligner.open_gap_score = -3
aligner.extend_gap_score = -1
pair_scores = []
for r1, r2 in combinations(records, 2):
score = aligner.score(str(r1.seq), str(r2.seq))
pair_scores.append((r1.id, r2.id, score))
pair_scores.sort(key=lambda x: x[2], reverse=True)
print("Pairwise homolog ranking:")
for item in pair_scores:
print(item)Pairwise homolog ranking:
('gi|5049839|gb|AI730987.1|AI730987', 'gi|5052071|gb|AF067555.1|AF067555', 228.0)
('gi|5690369|gb|AF158246.1|AF158246', 'gi|5052071|gb|AF067555.1|AF067555', 195.0)
('gi|5049839|gb|AI730987.1|AI730987', 'gi|5690369|gb|AF158246.1|AF158246', 185.0)
Ranking pairwise scores gives a practical similarity overview before selecting candidates for phylogenetic tree construction. ## Preparing a Combined FASTA for Downstream MSA Tools
from Bio import SeqIO
records = [
SeqIO.read("lupine.nu", "fasta"),
SeqIO.read("lavender.nu", "fasta"),
SeqIO.read("phlox.nu", "fasta"),
]
SeqIO.write(records, "homologs_combined.fasta", "fasta")
print("Wrote homologs_combined.fasta")Wrote homologs_combined.fasta
Exporting a combined FASTA is a practical handoff step to external MSA tools or collaborative workflows where one shared input file is needed.