Downloading and Aligning Homologous Genes with Biopython

Homologous gene comparisons are a core task in comparative genomics. A practical workflow usually starts with downloading related sequences, checking quality, and aligning pairs before deeper analyses like tree building.

In this tutorial, you will download homologous nucleotide sequences and align them with Biopython.

## Downloading Homologous Gene Sequences

import requests
from Bio import SeqIO

# Download three homologous nucleotide sequences from related plant examples
urls = {
    "lupine.nu": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/Fasta/lupine.nu",
    "lavender.nu": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/Fasta/lavender.nu",
    "phlox.nu": "https://raw.githubusercontent.com/biopython/biopython/master/Tests/Fasta/phlox.nu",
}

records = []
for filename, url in urls.items():
    response = requests.get(url, timeout=30)
    response.raise_for_status()
    with open(filename, "w", encoding="utf-8") as f:
        f.write(response.text)
    records.append(SeqIO.read(filename, "fasta"))

print("Downloaded records:", [r.id for r in records])
print("Sequence lengths:", [len(r.seq) for r in records])
Downloaded records: ['gi|5049839|gb|AI730987.1|AI730987', 'gi|5690369|gb|AF158246.1|AF158246', 'gi|5052071|gb|AF067555.1|AF067555']
Sequence lengths: [655, 550, 623]
This block downloads and stores homologous sequences locally so later alignment steps can be rerun without additional network calls.

## Pairwise Global Alignment of Homologs

from Bio import SeqIO
from Bio.Align import PairwiseAligner

# Read local FASTA records
lupine = SeqIO.read("lupine.nu", "fasta")
lavender = SeqIO.read("lavender.nu", "fasta")

# Configure a global aligner for homolog comparison
aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 2
aligner.mismatch_score = -1
aligner.open_gap_score = -3
aligner.extend_gap_score = -1

score = aligner.score(str(lupine.seq), str(lavender.seq))
alignment = aligner.align(str(lupine.seq), str(lavender.seq))[0]

print("Alignment score:", score)
print(alignment)
Alignment score: 185.0
target            0 GAAAATTCATTTTCTTTGGACTTTCTCTGAAATCCGAGTCCTAGGAAAG-ATGCGTGAGA
                  0 -----------------||-----||||.||.|-|..||-|||||.|.|-.|||...||.
query             0 -----------------GG-----CTCTTAAGT-CATGT-CTAGGCAGGTGTGCACAAGT

target           59 TT-CTTCATATTCAGGCAGGGCA-GTGCGGCAACCAGATCGGTGCCAATTTTTGGGAGGT
                 60 ||-..||.|.|||.|.||--|||-.||.||-----||||.||-|-------||||||..|
query            36 TTAGGTCTTGTTCCGCCA--GCATCTGAGG-----AGATTGG-G-------TTGGGAACT

target          117 GGTTTGC--GCCGAACA-CGGCATAAATTC---AACGGGA---CGCTATCAAGGGGACAA
                120 |||.|||--|.|-|.||-|.|..|..|..|---|.|||||---|.||-|||...|--||.
query            81 GGTATGCATGTC-ACCATCTGAGTTTAGCCCAGATCGGGAGTCCTCT-TCATCAG--CAT

target          168 TGATTTGCAACTGGAGCGCGTTAACGTTTACTACAATGAAGCGAGT--TGC-GGGAGGTT
                180 |||---|.||.|.|||...|..|..|.|--|-|.|||||.|.|.||--.||-.|||..|.
query           137 TGA---GAAAGTAGAGACTGGCACAGGT--C-ATAATGAGGTGGGTCAAGCATGGATATG

target          225 CGTTCCACGAGCCGTCCTTATGGACTTGG---AGCCTGGCACTATGGAT-AGTGTTAGAT
                240 .|..||..|..|.|.|.|.|..||...||---||.||.|||-||.|.||-||..||...|
query           191 AGGCCCTGGTTCTGCCATCACCGAGACGGCAAAGGCTAGCA-TAGGTATGAGACTTCCTT

target          281 CCGGGCCGTAC-GGGCAGATTTTCAGGCCTGACAACTTTGTTTTCGGCCAGTCCGG--TG
                300 ||.|.|.||.|-|.|.|-|.|...|-||||-|||--...||||..||||--|..||--||
query           250 CCTGCCAGTGCTGAGGA-ACTGGGA-GCCT-ACA--GAGGTTTGAGGCC--TTGGGTCTG

target          338 CAGGGAATAACTGGGCCAAGGGTCATTACACGGAAGGGGCCGAGTTGATCG--ACTCANT
                360 |-..|||.||--.|||||--|.|------.|||.|---.|....||||.||--.||||-.
query           303 C-CAGAAGAA--AGGCCA--GCT------GCGGTA---ACATCTTTGACCGTCCCTCA-C

target          396 GCTAG-ATGTGGTG-AGGAAAGAAGCCGAAAACTGTGACTGCTTGCAAGGATTTCAAGTA
                420 |-|||-.||...||-.||..||.|--.|||.|--||.|||..||-|..||||.|...||.
query           348 G-TAGTCTGCCTTGCTGGCCAGCA--TGAAGA--GTCACTAGTT-CCTGGATCTGCTGTC

target          454 TGCCATTCATTGGGAAGAGGAACTGGATCGGGCATGGGAACACTTTTGATTTCCAAGATC
                480 ||--|---|.|||||||||--------|||||-.||||..|||---------||----||
query           402 TG--A---ACTGGGAAGAG--------TCGGG-TTGGGTCCAC---------CC----TC

target          514 CGAGAAGAATACCCAGACCGAATGATGCTCACTTTCTCTGTGTTC---CCTTCCCCTAAG
                540 |-||.....||-||.|-..|||-|||--|..|||--||||.|..|---..|..|...|.|
query           435 C-AGGCTGTTA-CCTG-TGGAA-GAT--TGGCTT--TCTGGGGACAGGTGTGGCAGGAGG

target          571 GTGTCTGACACTGTTGTCGAACCTTACAATGCTACTCTCTCTGTTCACCANCTTGTTGAG
                600 |.|...|..|-.|.|||-|--||-|.|..|.||.|..|..|..|||-----|||.|----
query           487 GAGGGGGGTA-GGGTGT-G--CC-TGCCCTCCTCCAGTGCCGTTTC-----CTTCT----

target          631 AATGCTGATGAATGTATGGTTTTT 655
                660 -.|.||..||--.|||-||---|. 684
query           533 -CTCCTAGTG--GGTA-GG---TG 550

Pairwise global alignment is a practical first comparison when you want to quantify overall similarity between two homologs.

## Ranking Homolog Similarity Across Multiple Pairs

from Bio import SeqIO
from Bio.Align import PairwiseAligner
from itertools import combinations

records = [
    SeqIO.read("lupine.nu", "fasta"),
    SeqIO.read("lavender.nu", "fasta"),
    SeqIO.read("phlox.nu", "fasta"),
]

aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 2
aligner.mismatch_score = -1
aligner.open_gap_score = -3
aligner.extend_gap_score = -1

pair_scores = []
for r1, r2 in combinations(records, 2):
    score = aligner.score(str(r1.seq), str(r2.seq))
    pair_scores.append((r1.id, r2.id, score))

pair_scores.sort(key=lambda x: x[2], reverse=True)
print("Pairwise homolog ranking:")
for item in pair_scores:
    print(item)
Pairwise homolog ranking:
('gi|5049839|gb|AI730987.1|AI730987', 'gi|5052071|gb|AF067555.1|AF067555', 228.0)
('gi|5690369|gb|AF158246.1|AF158246', 'gi|5052071|gb|AF067555.1|AF067555', 195.0)
('gi|5049839|gb|AI730987.1|AI730987', 'gi|5690369|gb|AF158246.1|AF158246', 185.0)
Ranking pairwise scores gives a practical similarity overview before selecting candidates for phylogenetic tree construction.

## Preparing a Combined FASTA for Downstream MSA Tools

from Bio import SeqIO

records = [
    SeqIO.read("lupine.nu", "fasta"),
    SeqIO.read("lavender.nu", "fasta"),
    SeqIO.read("phlox.nu", "fasta"),
]

SeqIO.write(records, "homologs_combined.fasta", "fasta")
print("Wrote homologs_combined.fasta")
Wrote homologs_combined.fasta
Exporting a combined FASTA is a practical handoff step to external MSA tools or collaborative workflows where one shared input file is needed.