Pairwise Sequence Alignment in Biopython

Pairwise sequence alignment is one of the most useful tools in bioinformatics. It helps you compare two DNA, RNA, or protein sequences so you can spot similarities, differences, mutations, insertions, and deletions. This matters in real research because alignment is often the first step in understanding whether two sequences may share a function, come from a common ancestor, or represent different versions of the same gene.

In this tutorial, you will learn how to perform pairwise sequence alignment in Biopython using the modern `PairwiseAligner` interface. You will see how to do global and local alignment, change scoring rules, inspect results, and work with both nucleotide and protein sequences.

## What pairwise alignment means

A pairwise alignment arranges two sequences so that matching characters line up as well as possible. To do that, the alignment algorithm may introduce gaps.

For example, two DNA sequences might align like this:

* `ACGTACGT`
* `ACG-ACGT`

The gap makes it easier to compare the rest of the letters position by position.

There are two major alignment styles:

* **Global alignment** compares the sequences from beginning to end.
* **Local alignment** finds the best matching region inside the sequences.

In Biopython, both are handled with `Bio.Align.PairwiseAligner`.

## Your first alignment with `PairwiseAligner`

Let’s start with a simple global alignment of two short DNA sequences.

from Bio.Align import PairwiseAligner

seq1 = "ACCGT"
seq2 = "ACG"

aligner = PairwiseAligner()
aligner.mode = "global"

alignments = aligner.align(seq1, seq2)

print("Number of alignments:", len(alignments))
print("Best score:", alignments.score)
print(alignments[0])
Number of alignments: 2
Best score: 1.0
target            0 ACCGT 5
                  0 ||-|- 5
query             0 AC-G- 3

This code creates a `PairwiseAligner` object, sets it to global mode, and aligns two sequences. The result is a collection of possible alignments, ordered by score. `alignments[0]` gives the highest-scoring alignment.

## Global alignment

Global alignment is best when the two sequences are expected to be similar across their full length. For example, this is common when comparing two versions of the same gene.

from Bio.Align import PairwiseAligner

seq1 = "GATTACA"
seq2 = "GCATGCU"

aligner = PairwiseAligner()
aligner.mode = "global"

# Set simple scoring rules
aligner.match_score = 1.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5

alignments = aligner.align(seq1, seq2)

best_alignment = alignments[0]

print("Alignment score:", best_alignment.score)
print(best_alignment)
Alignment score: -1.0
target            0 GATTACA 7
                  0 |..|.|. 7
query             0 GCATGCU 7

This example uses explicit scoring rules. A match adds to the score, while mismatches and gaps reduce it. The gap-opening penalty is larger than the gap-extension penalty, which reflects the idea that starting a gap is often less likely than extending one.

## Local alignment

Local alignment finds the best matching region rather than forcing the entire sequences to align. This is useful when one sequence may contain only a short region similar to the other.

from Bio.Align import PairwiseAligner

seq1 = "TTTACCGTAAA"
seq2 = "ACCGT"

aligner = PairwiseAligner()
aligner.mode = "local"
aligner.match_score = 2.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5

alignments = aligner.align(seq1, seq2)

best_alignment = alignments[0]

print("Alignment score:", best_alignment.score)
print(best_alignment)
Alignment score: 10.0
target            3 ACCGT 8
                  0 ||||| 5
query             0 ACCGT 5

Here, local mode finds the highest-scoring matching region inside the longer sequence. This is helpful when searching for conserved motifs, domains, or subsequences.

## Understanding the scoring system

Alignment depends heavily on the scoring scheme. Changing the scores can change which alignment is considered best.

from Bio.Align import PairwiseAligner

seq1 = "ACCGT"
seq2 = "ACG"

aligner = PairwiseAligner()
aligner.mode = "global"

aligner.match_score = 2.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -3.0
aligner.extend_gap_score = -1.0

print("Match score:", aligner.match_score)
print("Mismatch score:", aligner.mismatch_score)
print("Gap open score:", aligner.open_gap_score)
print("Gap extend score:", aligner.extend_gap_score)

alignments = aligner.align(seq1, seq2)
print(alignments[0])
Match score: 2.0
Mismatch score: -1.0
Gap open score: -3.0
Gap extend score: -1.0
target            0 ACCGT 5
                  0 |-||- 5
query             0 A-CG- 3

The scoring system determines how the aligner balances exact matches against mismatches and gaps. Larger gap penalties discourage insertions and deletions, while higher match scores reward similarity more strongly.

## Accessing alignment coordinates

Biopython alignments store coordinate information that tells you how the aligned blocks map onto each sequence.

from Bio.Align import PairwiseAligner

seq1 = "ACCGT"
seq2 = "ACG"

aligner = PairwiseAligner()
aligner.mode = "global"

alignment = aligner.align(seq1, seq2)[0]

print(alignment)
print("Coordinates:")
print(alignment.coordinates)
target            0 ACCGT 5
                  0 ||-|- 5
query             0 AC-G- 3

Coordinates:
[[0 2 3 4 5]
 [0 2 2 3 3]]
The `coordinates` array shows the aligned regions in each sequence. This is useful when you want to analyze where matches and gaps occur programmatically instead of only printing the alignment.

## Iterating through multiple high-scoring alignments

Sometimes more than one alignment has the same best score. You can inspect several of them.

from Bio.Align import PairwiseAligner

seq1 = "AACT"
seq2 = "ACT"

aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 1.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -1.0
aligner.extend_gap_score = -0.5

alignments = aligner.align(seq1, seq2)

for i, alignment in enumerate(list(alignments)[:3], start=1):
    print(f"Alignment {i}")
    print("Score:", alignment.score)
    print(alignment)
    print("-" * 40)
Alignment 1
Score: 2.0
target            0 AACT 4
                  0 -||| 4
query             0 -ACT 3

----------------------------------------
Alignment 2
Score: 2.0
target            0 AACT 4
                  0 |-|| 4
query             0 A-CT 3

----------------------------------------
This example prints the first few alignments. When several alignments tie for the top score, checking more than one can help you understand alternative ways the sequences could be matched.

## Comparing protein sequences

Pairwise alignment is also very important for protein sequences. For proteins, substitution matrices are often better than simple match and mismatch scores because some amino acid substitutions are more biologically plausible than others.

from Bio.Align import PairwiseAligner
from Bio.Align import substitution_matrices

seq1 = "MEEPQSDPSV"
seq2 = "MEEPRSDPSI"

aligner = PairwiseAligner()
aligner.mode = "global"

# Load a common substitution matrix for proteins
matrix = substitution_matrices.load("BLOSUM62")
aligner.substitution_matrix = matrix

aligner.open_gap_score = -10.0
aligner.extend_gap_score = -0.5

alignments = aligner.align(seq1, seq2)

best_alignment = alignments[0]

print("Alignment score:", best_alignment.score)
print(best_alignment)
Alignment score: 47.0
target            0 MEEPQSDPSV 10
                  0 ||||.||||. 10
query             0 MEEPRSDPSI 10

In this example, `BLOSUM62` is used instead of simple match and mismatch scores. This gives a more realistic scoring system for protein evolution because different amino acid substitutions are weighted differently.

## Why substitution matrices matter

When aligning proteins, treating every mismatch the same is usually too simplistic. Some substitutions, such as leucine to isoleucine, are more conservative than others.

from Bio.Align import PairwiseAligner
from Bio.Align import substitution_matrices

seq1 = "VLSPADKTNVKAAW"
seq2 = "VLSEGEWQLVLHVW"

aligner = PairwiseAligner()
aligner.mode = "local"

aligner.substitution_matrix = substitution_matrices.load("BLOSUM62")
aligner.open_gap_score = -10.0
aligner.extend_gap_score = -0.5

alignment = aligner.align(seq1, seq2)[0]

print("Best local alignment score:", alignment.score)
print(alignment)
Best local alignment score: 17.0
target            0 VLSPADKTNVKAAW 14
                  0 |||......|...| 14
query             0 VLSEGEWQLVLHVW 14

This local protein alignment looks for the best matching region using a substitution matrix. That makes it more useful for proteins than a basic character-by-character scoring system.

## Calculating only the alignment score

Sometimes you do not need the full alignment text. You may only want the score, especially if you are comparing many sequence pairs.

from Bio.Align import PairwiseAligner

seq1 = "ACGTAG"
seq2 = "ACGTCG"

aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 2.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5

score = aligner.score(seq1, seq2)

print("Alignment score:", score)
Alignment score: 9.0
Using `aligner.score()` is faster and simpler when you only need a numeric similarity measure and do not need to inspect the alignment itself.

## Reading sequences from FASTA files before aligning

In real projects, your sequences often come from files rather than being typed directly into the script. Biopython’s `SeqIO` module makes this easy.

from Bio import SeqIO
from Bio.Align import PairwiseAligner

# Create a small FASTA file for demonstration
with open("example_sequences.fasta", "w", encoding="utf-8") as handle:
    handle.write(">seq1\nACCGTGGAA\n>seq2\nACGTCGAA\n")

records = list(SeqIO.parse("example_sequences.fasta", "fasta"))

seq1 = str(records[0].seq)
seq2 = str(records[1].seq)

aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 1.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5

alignment = aligner.align(seq1, seq2)[0]

print(f"Comparing {records[0].id} and {records[1].id}")
print(alignment)
Comparing seq1 and seq2
target            0 ACCGTGGAA 9
                  0 |-|||.||| 9
query             0 A-CGTCGAA 8

This example writes a small FASTA file, reads the sequences with `SeqIO.parse()`, and aligns them. In a real workflow, you would replace the example file with your own data.

## Downloading a FASTA file from the web and aligning two sequences

You can also download sequence data directly inside a script and then align it.

import requests
from Bio import SeqIO
from Bio.Align import PairwiseAligner

url = "https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/opuntia.fasta"
response = requests.get(url, timeout=30)
response.raise_for_status()

with open("downloaded_sequences.fasta", "w", encoding="utf-8") as handle:
    handle.write(response.text)

records = list(SeqIO.parse("downloaded_sequences.fasta", "fasta"))

seq1 = str(records[0].seq)
seq2 = str(records[1].seq)

aligner = PairwiseAligner()
aligner.mode = "local"
aligner.match_score = 1.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5

alignment = aligner.align(seq1, seq2)[0]

print(f"Sequence 1 ID: {records[0].id}")
print(f"Sequence 2 ID: {records[1].id}")
print("Local alignment score:", alignment.score)
print(alignment)
Sequence 1 ID: gi|6273291|gb|AF191665.1|AF191665
Sequence 2 ID: gi|6273290|gb|AF191664.1|AF191664
Local alignment score: 885.5
target            0 TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAAAAAAATGAAT
                  0 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query             0 TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAAAAAAATGAAT

target           60 CTAAATGATATAGGATTCCACTATGTAAGGTCTTTGAATCATATCATAAAAGACAATGTA
                 60 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query            60 CTAAATGATATAGGATTCCACTATGTAAGGTCTTTGAATCATATCATAAAAGACAATGTA

target          120 ATAAAGCATGAATACAGATTCACACATAATTATCTGATATGAATCTATTCATAGAAAAAA
                120 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query           120 ATAAAGCATGAATACAGATTCACACATAATTATCTGATATGAATCTATTCATAGAAAAAA

target          180 GAAAAAAGTAAGAGCCTCCGGCCAATAAAGACTAAGAGGGTTGGCTCAAGAACAAAGTTC
                180 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query           180 GAAAAAAGTAAGAGCCTCCGGCCAATAAAGACTAAGAGGGTTGGCTCAAGAACAAAGTTC

target          240 ATTAAGAGCTCCATTGTAGAATTCAGACCTAATCATTAATCAAGAAGCGATGGGAACGAT
                240 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query           240 ATTAAGAGCTCCATTGTAGAATTCAGACCTAATCATTAATCAAGAAGCGATGGGAACGAT

target          300 GTAATCCATGAATACAGAAGATTCAATTGAAAAAGATCCT-ATGNT-CATT-GGAAGGAT
                300 ||||||||||||||||||||||||||||||||||||||||-|||||-||||-||||||||
query           300 GTAATCCATGAATACAGAAGATTCAATTGAAAAAGATCCTAATGNTNCATTGGGAAGGAT

target          357 GGCGGAACGAACCAGAGACCAATTCATCTATTCTGAAAAGTGATAAACTAATCCTATAAA
                360 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query           360 GGCGGAACGAACCAGAGACCAATTCATCTATTCTGAAAAGTGATAAACTAATCCTATAAA

target          417 ACTAAAATAGATATTGAAAGAGTAAATATTCGCCCGCGAAAATTCCTTTTTTATTAAATT
                420 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query           420 ACTAAAATAGATATTGAAAGAGTAAATATTCGCCCGCGAAAATTCCTTTTTTATTAAATT

target          477 GCTCATATTTTCTTTTAGCAATGCAATCTAATAAAATATATCTATACAAAAAAACATAGA
                480 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query           480 GCTCATATTTTCTTTTAGCAATGCAATCTAATAAAATATATCTATACAAAAAAACATAGA

target          537 CAAACTATATATATATATATATATAATATATTTCAAATTCCCTTATATATCCAAATATAA
                540 |||||------|||||||||||||||||||||||||||||||||||||||||||||||||
query           540 CAAAC------TATATATATATATAATATATTTCAAATTCCCTTATATATCCAAATATAA

target          597 AAATATCTAATAAATTAGATGAATATCAAAGAATCTATTGATTTAGTGTATTATTAAATG
                600 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query           594 AAATATCTAATAAATTAGATGAATATCAAAGAATCTATTGATTTAGTGTATTATTAAATG

target          657 TATATATTAATTCAATATTATTATTCTATTCATTTTTATTCATTTTCAAATTTATAATAT
                660 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query           654 TATATATTAATTCAATATTATTATTCTATTCATTTTTATTCATTTTCAAATTTATAATAT

target          717 ATTAATCTATATATTAATTTAGAATTCTATTCTAATTCGAATTCAATTTTTAAATATTCA
                720 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query           714 ATTAATCTATATATTAATTTAGAATTCTATTCTAATTCGAATTCAATTTTTAAATATTCA

target          777 TATTCAATTAAAATTGAAATTTTTTCATTCGCGAGGAGCCGGATGAGAAGAAACTCTCAT
                780 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query           774 TATTCAATTAAAATTGAAATTTTTTCATTCGCGAGGAGCCGGATGAGAAGAAACTCTCAT

target          837 GTCCGGTTCTGTAGTAGAGATGGAATTAAGAAAAAACCATCAACTATAACCCCAAAAGAA
                840 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
query           834 GTCCGGTTCTGTAGTAGAGATGGAATTAAGAAAAAACCATCAACTATAACCCCAAAAGAA

target          897 CCAGA 902
                900 ||||| 905
query           894 CCAGA 899

This code downloads a FASTA file, saves it locally, reads the first two sequences, and performs a local alignment. It shows how alignment can fit into a larger data-processing workflow.

## Changing end gap behavior

In some cases, you may want terminal gaps to be treated differently from internal gaps. Biopython gives you control over this through end-gap scoring settings.

from Bio.Align import PairwiseAligner

seq1 = "ACCGT"
seq2 = "CGT"

aligner = PairwiseAligner()
aligner.mode = "global"
aligner.match_score = 1.0
aligner.mismatch_score = -1.0
aligner.open_gap_score = -2.0
aligner.extend_gap_score = -0.5

# Make end gaps less costly on the target sequence
aligner.end_insertion_score = 0.0
aligner.end_deletion_score = 0.0

alignment = aligner.align(seq1, seq2)[0]

print("Alignment score:", alignment.score)
print(alignment)
Alignment score: 3.0
target            0 ACCGT 5
                  0 --||| 5
query             0 --CGT 3

This example changes the cost of gaps at the ends of sequences. That can be useful when one sequence is expected to be a fragment of the other, rather than a full-length sequence.

## A small reusable alignment function

As your scripts grow, it is helpful to wrap alignment logic in a function.

from Bio.Align import PairwiseAligner

def align_sequences(seq1, seq2, mode="global"):
    aligner = PairwiseAligner()
    aligner.mode = mode
    aligner.match_score = 1.0
    aligner.mismatch_score = -1.0
    aligner.open_gap_score = -2.0
    aligner.extend_gap_score = -0.5

    alignment = aligner.align(seq1, seq2)[0]
    return alignment

dna1 = "ATGCTAGC"
dna2 = "ATGCGAGC"

result = align_sequences(dna1, dna2, mode="global")

print("Score:", result.score)
print(result)
Score: 6.0
target            0 ATGCTAGC 8
                  0 ||||.||| 8
query             0 ATGCGAGC 8

This function makes your code easier to reuse and modify. You can call it for different sequence pairs without rewriting the scoring setup each time.

## Common mistakes

### Using the wrong alignment mode

A global alignment may force two unrelated sequence ends to line up badly. A local alignment may ignore important differences outside the best matching region. Choose the mode based on the biological question.

### Forgetting that scores depend on the scoring scheme

An alignment score only makes sense relative to the scoring settings you used. Changing the match score, gap penalties, or substitution matrix can change both the score and the alignment itself.

### Treating DNA and protein alignment the same way

DNA often works well with simple match and mismatch rules. Protein alignment usually benefits from substitution matrices such as `BLOSUM62`.

### Assuming there is only one best alignment

Sometimes several alignments have the same score. Looking at only the first one may hide equally valid alternatives.

## When to use global versus local alignment

Use **global alignment** when:

* the sequences are expected to be similar across most or all of their lengths
* you are comparing full genes or full proteins
* you want an end-to-end comparison

Use **local alignment** when:

* one sequence may contain only a matching region
* you want to find a conserved motif or domain
* the sequences may differ greatly outside the matching region

## Conclusion

Pairwise sequence alignment in Biopython is a practical way to compare biological sequences and explore their similarities. With `PairwiseAligner`, you can perform both global and local alignment, control the scoring system, work with DNA or proteins, and integrate alignment into larger workflows that read from files or download data from the web.

Once you are comfortable with pairwise alignment, the next natural step is to explore multiple sequence alignment and more advanced sequence analysis tools in Biopython.