Pairwise sequence alignment is one of the most useful tools in bioinformatics. It helps you compare two DNA, RNA, or protein sequences so you can spot similarities, differences, mutations, insertions, and deletions. This matters in real research because alignment is often the first step in understanding whether two sequences may share a function, come from a common ancestor, or represent different versions of the same gene. In this tutorial, you will learn how to perform pairwise sequence alignment in Biopython using the modern `PairwiseAligner` interface. You will see how to do global and local alignment, change scoring rules, inspect results, and work with both nucleotide and protein sequences. ## What pairwise alignment means A pairwise alignment arranges two sequences so that matching characters line up as well as possible. To do that, the alignment algorithm may introduce gaps. For example, two DNA sequences might align like this: * `ACGTACGT` * `ACG-ACGT` The gap makes it easier to compare the rest of the letters position by position. There are two major alignment styles: * **Global alignment** compares the sequences from beginning to end. * **Local alignment** finds the best matching region inside the sequences. In Biopython, both are handled with `Bio.Align.PairwiseAligner`. ## Your first alignment with `PairwiseAligner` Let’s start with a simple global alignment of two short DNA sequences.
from Bio.Align import PairwiseAligner
seq1 = "ACCGT"
seq2 = "ACG"
aligner = PairwiseAligner()
aligner.mode = "global"
alignments = aligner.align(seq1, seq2)
print("Number of alignments:", len(alignments))
print("Best score:", alignments.score)
print(alignments[0])Number of alignments: 2
Best score: 1.0
target 0 ACCGT 5
0 ||-|- 5
query 0 AC-G- 3
This code creates a `PairwiseAligner` object, sets it to global mode, and aligns two sequences. The result is a collection of possible alignments, ordered by score. `alignments[0]` gives the highest-scoring alignment. ## Global alignment Global alignment is best when the two sequences are expected to be similar across their full length. For example, this is common when comparing two versions of the same gene.
from Bio.Align import PairwiseAligner
seq1 = "GATTACA"
seq2 = "GCATGCU"
aligner = PairwiseAligner()
aligner.mode = "global"
# Set simple scoring rules
aligner.match_score = 1.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5
alignments = aligner.align(seq1, seq2)
best_alignment = alignments[0]
print("Alignment score:", best_alignment.score)
print(best_alignment)Alignment score: -1.0
target 0 GATTACA 7
0 |..|.|. 7
query 0 GCATGCU 7
This example uses explicit scoring rules. A match adds to the score, while mismatches and gaps reduce it. The gap-opening penalty is larger than the gap-extension penalty, which reflects the idea that starting a gap is often less likely than extending one. ## Local alignment Local alignment finds the best matching region rather than forcing the entire sequences to align. This is useful when one sequence may contain only a short region similar to the other.
from Bio.Align import PairwiseAligner
seq1 = "TTTACCGTAAA"
seq2 = "ACCGT"
aligner = PairwiseAligner()
aligner.mode = "local"
aligner.match_score = 2.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5
alignments = aligner.align(seq1, seq2)
best_alignment = alignments[0]
print("Alignment score:", best_alignment.score)
print(best_alignment)Alignment score: 10.0
target 3 ACCGT 8
0 ||||| 5
query 0 ACCGT 5
Here, local mode finds the highest-scoring matching region inside the longer sequence. This is helpful when searching for conserved motifs, domains, or subsequences. ## Understanding the scoring system Alignment depends heavily on the scoring scheme. Changing the scores can change which alignment is considered best.
from Bio.Align import PairwiseAligner
seq1 = "ACCGT"
seq2 = "ACG"
aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 2.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -3.0
aligner.extend_gap_score = -1.0
print("Match score:", aligner.match_score)
print("Mismatch score:", aligner.mismatch_score)
print("Gap open score:", aligner.open_gap_score)
print("Gap extend score:", aligner.extend_gap_score)
alignments = aligner.align(seq1, seq2)
print(alignments[0])Match score: 2.0
Mismatch score: -1.0
Gap open score: -3.0
Gap extend score: -1.0
target 0 ACCGT 5
0 |-||- 5
query 0 A-CG- 3
The scoring system determines how the aligner balances exact matches against mismatches and gaps. Larger gap penalties discourage insertions and deletions, while higher match scores reward similarity more strongly. ## Accessing alignment coordinates Biopython alignments store coordinate information that tells you how the aligned blocks map onto each sequence.
from Bio.Align import PairwiseAligner
seq1 = "ACCGT"
seq2 = "ACG"
aligner = PairwiseAligner()
aligner.mode = "global"
alignment = aligner.align(seq1, seq2)[0]
print(alignment)
print("Coordinates:")
print(alignment.coordinates)target 0 ACCGT 5
0 ||-|- 5
query 0 AC-G- 3
Coordinates:
[[0 2 3 4 5]
[0 2 2 3 3]]
The `coordinates` array shows the aligned regions in each sequence. This is useful when you want to analyze where matches and gaps occur programmatically instead of only printing the alignment. ## Iterating through multiple high-scoring alignments Sometimes more than one alignment has the same best score. You can inspect several of them.
from Bio.Align import PairwiseAligner
seq1 = "AACT"
seq2 = "ACT"
aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 1.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -1.0
aligner.extend_gap_score = -0.5
alignments = aligner.align(seq1, seq2)
for i, alignment in enumerate(list(alignments)[:3], start=1):
print(f"Alignment {i}")
print("Score:", alignment.score)
print(alignment)
print("-" * 40)Alignment 1
Score: 2.0
target 0 AACT 4
0 -||| 4
query 0 -ACT 3
----------------------------------------
Alignment 2
Score: 2.0
target 0 AACT 4
0 |-|| 4
query 0 A-CT 3
----------------------------------------
This example prints the first few alignments. When several alignments tie for the top score, checking more than one can help you understand alternative ways the sequences could be matched. ## Comparing protein sequences Pairwise alignment is also very important for protein sequences. For proteins, substitution matrices are often better than simple match and mismatch scores because some amino acid substitutions are more biologically plausible than others.
from Bio.Align import PairwiseAligner
from Bio.Align import substitution_matrices
seq1 = "MEEPQSDPSV"
seq2 = "MEEPRSDPSI"
aligner = PairwiseAligner()
aligner.mode = "global"
# Load a common substitution matrix for proteins
matrix = substitution_matrices.load("BLOSUM62")
aligner.substitution_matrix = matrix
aligner.open_gap_score = -10.0
aligner.extend_gap_score = -0.5
alignments = aligner.align(seq1, seq2)
best_alignment = alignments[0]
print("Alignment score:", best_alignment.score)
print(best_alignment)Alignment score: 47.0
target 0 MEEPQSDPSV 10
0 ||||.||||. 10
query 0 MEEPRSDPSI 10
In this example, `BLOSUM62` is used instead of simple match and mismatch scores. This gives a more realistic scoring system for protein evolution because different amino acid substitutions are weighted differently. ## Why substitution matrices matter When aligning proteins, treating every mismatch the same is usually too simplistic. Some substitutions, such as leucine to isoleucine, are more conservative than others.
from Bio.Align import PairwiseAligner
from Bio.Align import substitution_matrices
seq1 = "VLSPADKTNVKAAW"
seq2 = "VLSEGEWQLVLHVW"
aligner = PairwiseAligner()
aligner.mode = "local"
aligner.substitution_matrix = substitution_matrices.load("BLOSUM62")
aligner.open_gap_score = -10.0
aligner.extend_gap_score = -0.5
alignment = aligner.align(seq1, seq2)[0]
print("Best local alignment score:", alignment.score)
print(alignment)Best local alignment score: 17.0
target 0 VLSPADKTNVKAAW 14
0 |||......|...| 14
query 0 VLSEGEWQLVLHVW 14
This local protein alignment looks for the best matching region using a substitution matrix. That makes it more useful for proteins than a basic character-by-character scoring system. ## Calculating only the alignment score Sometimes you do not need the full alignment text. You may only want the score, especially if you are comparing many sequence pairs.
from Bio.Align import PairwiseAligner
seq1 = "ACGTAG"
seq2 = "ACGTCG"
aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 2.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5
score = aligner.score(seq1, seq2)
print("Alignment score:", score)Alignment score: 9.0
Using `aligner.score()` is faster and simpler when you only need a numeric similarity measure and do not need to inspect the alignment itself. ## Reading sequences from FASTA files before aligning In real projects, your sequences often come from files rather than being typed directly into the script. Biopython’s `SeqIO` module makes this easy.
from Bio import SeqIO
from Bio.Align import PairwiseAligner
# Create a small FASTA file for demonstration
with open("example_sequences.fasta", "w", encoding="utf-8") as handle:
handle.write(">seq1\nACCGTGGAA\n>seq2\nACGTCGAA\n")
records = list(SeqIO.parse("example_sequences.fasta", "fasta"))
seq1 = str(records[0].seq)
seq2 = str(records[1].seq)
aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 1.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5
alignment = aligner.align(seq1, seq2)[0]
print(f"Comparing {records[0].id} and {records[1].id}")
print(alignment)Comparing seq1 and seq2
target 0 ACCGTGGAA 9
0 |-|||.||| 9
query 0 A-CGTCGAA 8
This example writes a small FASTA file, reads the sequences with `SeqIO.parse()`, and aligns them. In a real workflow, you would replace the example file with your own data. ## Downloading a FASTA file from the web and aligning two sequences You can also download sequence data directly inside a script and then align it.
import requests
from Bio import SeqIO
from Bio.Align import PairwiseAligner
url = "https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/opuntia.fasta"
response = requests.get(url, timeout=30)
response.raise_for_status()
with open("downloaded_sequences.fasta", "w", encoding="utf-8") as handle:
handle.write(response.text)
records = list(SeqIO.parse("downloaded_sequences.fasta", "fasta"))
seq1 = str(records[0].seq)
seq2 = str(records[1].seq)
aligner = PairwiseAligner()
aligner.mode = "local"
aligner.match_score = 1.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5
alignment = aligner.align(seq1, seq2)[0]
print(f"Sequence 1 ID: {records[0].id}")
print(f"Sequence 2 ID: {records[1].id}")
print("Local alignment score:", alignment.score)
print(alignment)Sequence 1 ID: gi|6273291|gb|AF191665.1|AF191665
Sequence 2 ID: gi|6273290|gb|AF191664.1|AF191664
Local alignment score: 885.5
target 0 TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAAAAAAATGAAT
0 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query 0 TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAAAAAAATGAAT
target 60 CTAAATGATATAGGATTCCACTATGTAAGGTCTTTGAATCATATCATAAAAGACAATGTA
60 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query 60 CTAAATGATATAGGATTCCACTATGTAAGGTCTTTGAATCATATCATAAAAGACAATGTA
target 120 ATAAAGCATGAATACAGATTCACACATAATTATCTGATATGAATCTATTCATAGAAAAAA
120 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query 120 ATAAAGCATGAATACAGATTCACACATAATTATCTGATATGAATCTATTCATAGAAAAAA
target 180 GAAAAAAGTAAGAGCCTCCGGCCAATAAAGACTAAGAGGGTTGGCTCAAGAACAAAGTTC
180 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query 180 GAAAAAAGTAAGAGCCTCCGGCCAATAAAGACTAAGAGGGTTGGCTCAAGAACAAAGTTC
target 240 ATTAAGAGCTCCATTGTAGAATTCAGACCTAATCATTAATCAAGAAGCGATGGGAACGAT
240 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query 240 ATTAAGAGCTCCATTGTAGAATTCAGACCTAATCATTAATCAAGAAGCGATGGGAACGAT
target 300 GTAATCCATGAATACAGAAGATTCAATTGAAAAAGATCCT-ATGNT-CATT-GGAAGGAT
300 ||||||||||||||||||||||||||||||||||||||||-|||||-||||-||||||||
query 300 GTAATCCATGAATACAGAAGATTCAATTGAAAAAGATCCTAATGNTNCATTGGGAAGGAT
target 357 GGCGGAACGAACCAGAGACCAATTCATCTATTCTGAAAAGTGATAAACTAATCCTATAAA
360 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query 360 GGCGGAACGAACCAGAGACCAATTCATCTATTCTGAAAAGTGATAAACTAATCCTATAAA
target 417 ACTAAAATAGATATTGAAAGAGTAAATATTCGCCCGCGAAAATTCCTTTTTTATTAAATT
420 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query 420 ACTAAAATAGATATTGAAAGAGTAAATATTCGCCCGCGAAAATTCCTTTTTTATTAAATT
target 477 GCTCATATTTTCTTTTAGCAATGCAATCTAATAAAATATATCTATACAAAAAAACATAGA
480 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query 480 GCTCATATTTTCTTTTAGCAATGCAATCTAATAAAATATATCTATACAAAAAAACATAGA
target 537 CAAACTATATATATATATATATATAATATATTTCAAATTCCCTTATATATCCAAATATAA
540 |||||------|||||||||||||||||||||||||||||||||||||||||||||||||
query 540 CAAAC------TATATATATATATAATATATTTCAAATTCCCTTATATATCCAAATATAA
target 597 AAATATCTAATAAATTAGATGAATATCAAAGAATCTATTGATTTAGTGTATTATTAAATG
600 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query 594 AAATATCTAATAAATTAGATGAATATCAAAGAATCTATTGATTTAGTGTATTATTAAATG
target 657 TATATATTAATTCAATATTATTATTCTATTCATTTTTATTCATTTTCAAATTTATAATAT
660 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query 654 TATATATTAATTCAATATTATTATTCTATTCATTTTTATTCATTTTCAAATTTATAATAT
target 717 ATTAATCTATATATTAATTTAGAATTCTATTCTAATTCGAATTCAATTTTTAAATATTCA
720 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query 714 ATTAATCTATATATTAATTTAGAATTCTATTCTAATTCGAATTCAATTTTTAAATATTCA
target 777 TATTCAATTAAAATTGAAATTTTTTCATTCGCGAGGAGCCGGATGAGAAGAAACTCTCAT
780 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query 774 TATTCAATTAAAATTGAAATTTTTTCATTCGCGAGGAGCCGGATGAGAAGAAACTCTCAT
target 837 GTCCGGTTCTGTAGTAGAGATGGAATTAAGAAAAAACCATCAACTATAACCCCAAAAGAA
840 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query 834 GTCCGGTTCTGTAGTAGAGATGGAATTAAGAAAAAACCATCAACTATAACCCCAAAAGAA
target 897 CCAGA 902
900 ||||| 905
query 894 CCAGA 899
This code downloads a FASTA file, saves it locally, reads the first two sequences, and performs a local alignment. It shows how alignment can fit into a larger data-processing workflow. ## Changing end gap behavior In some cases, you may want terminal gaps to be treated differently from internal gaps. Biopython gives you control over this through end-gap scoring settings.
from Bio.Align import PairwiseAligner
seq1 = "ACCGT"
seq2 = "CGT"
aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 1.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5
# Make end gaps less costly on the target sequence
aligner.end_insertion_score = 0.0
aligner.end_deletion_score = 0.0
alignment = aligner.align(seq1, seq2)[0]
print("Alignment score:", alignment.score)
print(alignment)Alignment score: 3.0
target 0 ACCGT 5
0 --||| 5
query 0 --CGT 3
This example changes the cost of gaps at the ends of sequences. That can be useful when one sequence is expected to be a fragment of the other, rather than a full-length sequence. ## A small reusable alignment function As your scripts grow, it is helpful to wrap alignment logic in a function.
from Bio.Align import PairwiseAligner
def align_sequences(seq1, seq2, mode="global"):
aligner = PairwiseAligner()
aligner.mode = mode
aligner.match_score = 1.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5
alignment = aligner.align(seq1, seq2)[0]
return alignment
dna1 = "ATGCTAGC"
dna2 = "ATGCGAGC"
result = align_sequences(dna1, dna2, mode="global")
print("Score:", result.score)
print(result)Score: 6.0
target 0 ATGCTAGC 8
0 ||||.||| 8
query 0 ATGCGAGC 8
This function makes your code easier to reuse and modify. You can call it for different sequence pairs without rewriting the scoring setup each time. ## Common mistakes ### Using the wrong alignment mode A global alignment may force two unrelated sequence ends to line up badly. A local alignment may ignore important differences outside the best matching region. Choose the mode based on the biological question. ### Forgetting that scores depend on the scoring scheme An alignment score only makes sense relative to the scoring settings you used. Changing the match score, gap penalties, or substitution matrix can change both the score and the alignment itself. ### Treating DNA and protein alignment the same way DNA often works well with simple match and mismatch rules. Protein alignment usually benefits from substitution matrices such as `BLOSUM62`. ### Assuming there is only one best alignment Sometimes several alignments have the same score. Looking at only the first one may hide equally valid alternatives. ## When to use global versus local alignment Use **global alignment** when: * the sequences are expected to be similar across most or all of their lengths * you are comparing full genes or full proteins * you want an end-to-end comparison Use **local alignment** when: * one sequence may contain only a matching region * you want to find a conserved motif or domain * the sequences may differ greatly outside the matching region ## Conclusion Pairwise sequence alignment in Biopython is a practical way to compare biological sequences and explore their similarities. With `PairwiseAligner`, you can perform both global and local alignment, control the scoring system, work with DNA or proteins, and integrate alignment into larger workflows that read from files or download data from the web. Once you are comfortable with pairwise alignment, the next natural step is to explore multiple sequence alignment and more advanced sequence analysis tools in Biopython.