Restriction enzymes are essential tools in molecular biology. They recognize specific DNA sequences and cut the DNA at or near those sites. If you are studying cloning, plasmid design, PCR products, or general sequence analysis, being able to predict restriction sites is extremely useful. Biopython includes a dedicated module, `Bio.Restriction`, that lets you search sequences for enzyme cut sites, analyze many enzymes at once, and simulate simple restriction digests in Python. In this tutorial, you will learn how to use Biopython’s restriction enzyme tools to inspect DNA sequences, find which enzymes cut a sequence, examine cut positions, and work with batches of enzymes. ## What `Bio.Restriction` provides The `Bio.Restriction` module contains classes for many common restriction enzymes, such as `EcoRI`, `BamHI`, and `HindIII`. These enzyme objects know their recognition sequences and cutting behavior. The module also provides tools for: * searching a DNA sequence for restriction sites * analyzing many enzymes at once * selecting enzymes from a batch * formatting digest summaries You will usually combine `Bio.Restriction` with `Bio.Seq`. ## Your first restriction enzyme search Let’s begin with a simple example. We will create a DNA sequence and search it for an `EcoRI` cut site.
from Bio.Seq import Seq
from Bio.Restriction import EcoRI
sequence = Seq("AAGAATTCGCGCGAATTC")
cut_positions = EcoRI.search(sequence)
print("EcoRI cut positions:", cut_positions)
print("Recognition site:", EcoRI.site)EcoRI cut positions: [4, 14] Recognition site: GAATTC
This code creates a DNA sequence and uses `EcoRI.search()` to find all positions where `EcoRI` cuts. The result is a list of positions in the sequence. The enzyme object also stores its recognition site, which you can inspect through `EcoRI.site`. ## Understanding the result of `search()` The positions returned by `search()` are cut positions, not just the start of the recognition sequence. This is important because restriction enzymes often cut a few bases into or near the recognition site.
from Bio.Seq import Seq
from Bio.Restriction import BamHI
sequence = Seq("GGATCCAAAAGGATCC")
positions = BamHI.search(sequence)
print("BamHI recognition site:", BamHI.site)
print("BamHI cut positions:", positions)BamHI recognition site: GGATCC BamHI cut positions: [2, 12]
This example searches for `BamHI` sites. The list tells you where the enzyme would cut this sequence according to its defined cutting pattern. ## Testing several enzymes on the same sequence In practice, you usually want to test more than one enzyme. `RestrictionBatch` lets you group enzymes together and analyze them as a set.
from Bio.Seq import Seq
from Bio.Restriction import EcoRI, BamHI, HindIII, RestrictionBatch
sequence = Seq("AAGAATTCGGATCCAAGCTTGAATTC")
enzyme_batch = RestrictionBatch([EcoRI, BamHI, HindIII])
results = enzyme_batch.search(sequence)
for enzyme, positions in results.items():
print(f"{enzyme}: {positions}")EcoRI: [4, 22] BamHI: [10] HindIII: [16]
This code creates a batch containing three enzymes and searches the sequence once. The result is a dictionary-like object mapping each enzyme to its cut positions. ## Using `Analysis` for a formatted digest summary Biopython also includes an `Analysis` class that gives you a more structured way to inspect which enzymes cut a sequence.
from Bio.Seq import Seq
from Bio.Restriction import EcoRI, BamHI, HindIII, RestrictionBatch, Analysis
sequence = Seq("AAGAATTCGGATCCAAGCTTGAATTC")
batch = RestrictionBatch([EcoRI, BamHI, HindIII])
analysis = Analysis(batch, sequence)
print("Full analysis dictionary:")
print(analysis.full())
print("\nEnzymes that cut at least once:")
print(analysis.with_sites())
print("\nEnzymes that do not cut:")
print(analysis.without_site())Full analysis dictionary:
{EcoRI: [4, 22], BamHI: [10], HindIII: [16]}
Enzymes that cut at least once:
{EcoRI: [4, 22], BamHI: [10], HindIII: [16]}
Enzymes that do not cut:
{}
This example builds an `Analysis` object from a batch and a sequence. The method `full()` returns all enzymes with their cut positions. The method `with_sites()` keeps only enzymes that cut the sequence, while `without_site()` shows enzymes that do not. ## Finding enzymes that cut exactly once A common cloning question is: which enzymes cut a sequence exactly one time? Those are often especially useful in plasmid design.
from Bio.Seq import Seq
from Bio.Restriction import EcoRI, BamHI, HindIII, PstI, RestrictionBatch, Analysis
sequence = Seq("AAGAATTCGGATCCAAGCTTGAATTCCTGCAG")
batch = RestrictionBatch([EcoRI, BamHI, HindIII, PstI])
analysis = Analysis(batch, sequence)
print("Enzymes cutting exactly once:")
print(analysis.with_N_sites(1))Enzymes cutting exactly once:
{BamHI: [10], HindIII: [16], PstI: [32]}
This code filters the enzyme list to only those enzymes with one cut site in the sequence. ## Getting the recognition site and cut behavior Each restriction enzyme object contains useful metadata. You can inspect things like the recognition sequence and the cut offsets.
from Bio.Restriction import EcoRI, KpnI
for enzyme in [EcoRI, KpnI]:
print("Enzyme:", enzyme)
print("Recognition site:", enzyme.site)
print("Is palindromic?:", enzyme.is_palindromic())
print("Cuts once on each strand?:", enzyme.is_blunt() or enzyme.is_5overhang() or enzyme.is_3overhang())
print("Produces blunt ends?:", enzyme.is_blunt())
print("Produces 5' overhang?:", enzyme.is_5overhang())
print("Produces 3' overhang?:", enzyme.is_3overhang())
print("-" * 40)Enzyme: EcoRI Recognition site: GAATTC Is palindromic?: True Cuts once on each strand?: True Produces blunt ends?: False Produces 5' overhang?: True Produces 3' overhang?: False ---------------------------------------- Enzyme: KpnI Recognition site: GGTACC Is palindromic?: True Cuts once on each strand?: True Produces blunt ends?: False Produces 5' overhang?: False Produces 3' overhang?: True ----------------------------------------
This example shows how enzyme classes store biological properties. Those properties help you choose enzymes based on the kind of ends they produce. ## Simulating a simple digest by cutting a sequence into fragments The restriction module tells you where enzymes cut. You can then use those cut positions to split the sequence into fragments.
from Bio.Seq import Seq
from Bio.Restriction import EcoRI
sequence = Seq("TTTGAATTCAAAGAATTCGGG")
cut_positions = EcoRI.search(sequence)
print("Cut positions:", cut_positions)
fragments = []
start = 0
for cut in cut_positions:
fragments.append(sequence[start:cut])
start = cut
fragments.append(sequence[start:])
print("Fragments after EcoRI digest:")
for i, fragment in enumerate(fragments, start=1):
print(f"Fragment {i}: {fragment} (length {len(fragment)})")Cut positions: [5, 14] Fragments after EcoRI digest: Fragment 1: TTTGA (length 5) Fragment 2: ATTCAAAGA (length 9) Fragment 3: ATTCGGG (length 7)
This code uses the cut positions returned by `EcoRI.search()` to split the sequence into fragments. It is a simple way to model a digest and inspect fragment lengths. ## Comparing multiple enzymes by number of cut sites When choosing enzymes, it is often helpful to compare how frequently each one cuts.
from Bio.Seq import Seq
from Bio.Restriction import EcoRI, BamHI, HindIII, PstI, SmaI, RestrictionBatch
sequence = Seq("GAATTCAAGCTTGGATCCCTGCAGCCCGGGGAATTC")
batch = RestrictionBatch([EcoRI, BamHI, HindIII, PstI, SmaI])
results = batch.search(sequence)
for enzyme, positions in results.items():
print(f"{enzyme} cuts {len(positions)} time(s): {positions}")BamHI cuts 1 time(s): [14] SmaI cuts 1 time(s): [28] EcoRI cuts 2 time(s): [2, 32] HindIII cuts 1 time(s): [8] PstI cuts 1 time(s): [24]
This example counts the number of cut sites for each enzyme so you can compare them quickly. ## Working with all known enzymes Biopython includes a large collection of enzymes. You can search with a predefined set such as `AllEnzymes`.
from Bio.Seq import Seq
from Bio.Restriction import AllEnzymes
sequence = Seq("GAATTCAAGCTTGGATCCCTGCAGCCCGGGGAATTC")
results = AllEnzymes.search(sequence)
enzymes_with_sites = {enzyme: positions for enzyme, positions in results.items() if positions}
print("Number of enzymes that cut this sequence:", len(enzymes_with_sites))
for enzyme, positions in list(enzymes_with_sites.items())[:10]:
print(f"{enzyme}: {positions}")Number of enzymes that cut this sequence: 116 FspEI: [12, 13, 14, 31, 32] Nli3877I: [30] BspMAI: [24] PsuGI: [25, 26] SatI: [23] AgsI: [7] YkrI: [4, 5, 12, 15, 17, 19, 20, 21, 21, 22, 28, 29, 30, 33, 36] BthCI: [25] HindIII: [8] AlwI: [9, 22]
This code searches the sequence against the full built-in enzyme collection. Because there can be many matches, it filters to enzymes that actually cut and prints only the first few. ## Creating a sequence from FASTA data before analysis In real work, your DNA sequence often comes from a FASTA file. You can read the sequence with `SeqIO` and then analyze it with restriction enzymes.
from Bio import SeqIO
from Bio.Restriction import EcoRI, BamHI, HindIII, RestrictionBatch
with open("example.fasta", "w", encoding="utf-8") as handle:
handle.write(">example_sequence\nAAGAATTCGGATCCAAGCTTGAATTC\n")
record = SeqIO.read("example.fasta", "fasta")
sequence = record.seq
batch = RestrictionBatch([EcoRI, BamHI, HindIII])
results = batch.search(sequence)
print("Sequence ID:", record.id)
for enzyme, positions in results.items():
print(f"{enzyme}: {positions}")Sequence ID: example_sequence EcoRI: [4, 22] BamHI: [10] HindIII: [16]
This example writes a small FASTA file, reads it back in, and uses the sequence for restriction analysis. In a real project, you would replace the example FASTA file with your own. ## Downloading a FASTA file from the web and checking for restriction sites Restriction analysis often appears inside larger workflows. Here is an example that downloads a FASTA file and inspects it for a few enzymes.
import requests
from Bio import SeqIO
from Bio.Restriction import EcoRI, BamHI, HindIII, RestrictionBatch
url = "https://raw.githubusercontent.com/biopython/biopython/master/Doc/examples/ls_orchid.fasta"
response = requests.get(url, timeout=30)
response.raise_for_status()
with open("downloaded_sequences.fasta", "w", encoding="utf-8") as handle:
handle.write(response.text)
records = list(SeqIO.parse("downloaded_sequences.fasta", "fasta"))
first_record = records[0]
sequence = first_record.seq
batch = RestrictionBatch([EcoRI, BamHI, HindIII])
results = batch.search(sequence)
print("Downloaded sequence ID:", first_record.id)
print("Sequence length:", len(sequence))
for enzyme, positions in results.items():
print(f"{enzyme}: {positions}")Downloaded sequence ID: gi|2765658|emb|Z78533.1|CIZ78533 Sequence length: 740 EcoRI: [] BamHI: [] HindIII: []
This script downloads a FASTA file, reads the first sequence, and checks for common restriction sites. It shows how Biopython’s restriction tools can fit into a real data pipeline. ## Finding blunt-end and sticky-end cutters Sometimes you care less about the exact enzyme name and more about the type of ends produced after cutting.
from Bio.Restriction import EcoRI, BamHI, SmaI, KpnI
enzymes = [EcoRI, BamHI, SmaI, KpnI]
for enzyme in enzymes:
if enzyme.is_blunt():
end_type = "blunt end"
elif enzyme.is_5overhang():
end_type = "5' overhang"
elif enzyme.is_3overhang():
end_type = "3' overhang"
else:
end_type = "unknown"
print(f"{enzyme}: {end_type}")EcoRI: 5' overhang BamHI: 5' overhang SmaI: blunt end KpnI: 3' overhang
This example classifies enzymes by the kind of DNA ends they generate. That can help you decide which enzymes are compatible with a cloning strategy. ## Selecting enzymes from a batch A `RestrictionBatch` is not just a list. You can add enzymes and build custom sets for repeated use.
from Bio.Restriction import RestrictionBatch, EcoRI, BamHI
batch = RestrictionBatch()
batch.add(EcoRI)
batch.add(BamHI)
print("Custom batch contents:")
for enzyme in sorted(batch, key=lambda e: str(e)):
print(enzyme)Custom batch contents: BamHI EcoRI
This code builds a batch step by step. That is useful when you want to construct a reusable set of enzymes for a project. ## A reusable function for restriction analysis As your code grows, it helps to wrap the analysis in a function.
from Bio.Seq import Seq
from Bio.Restriction import RestrictionBatch, EcoRI, BamHI, HindIII
def summarize_digest(sequence_text, enzymes):
sequence = Seq(sequence_text)
batch = RestrictionBatch(enzymes)
results = batch.search(sequence)
summary = {}
for enzyme, positions in results.items():
summary[str(enzyme)] = positions
return summary
sequence = "AAGAATTCGGATCCAAGCTTGAATTC"
summary = summarize_digest(sequence, [EcoRI, BamHI, HindIII])
for enzyme_name, positions in summary.items():
print(f"{enzyme_name}: {positions}")EcoRI: [4, 22] BamHI: [10] HindIII: [16]
This function takes a DNA sequence as text and a list of enzyme classes, then returns a simple dictionary showing the cut positions. ## Common mistakes ### Using a protein sequence instead of a DNA sequence Restriction enzymes act on DNA. If your sequence contains amino acids instead of nucleotide letters, the results will not make biological sense. ### Forgetting that cut positions are not always the recognition start The list returned by `search()` gives cut positions according to the enzyme’s cutting pattern. Do not assume those numbers are simply the first base of the recognition sequence. ### Mixing uppercase and lowercase concerns Biopython generally handles DNA sequence case well, but it is still a good habit to keep nucleotide sequences clean and consistent. ### Searching with too many enzymes too early Using `AllEnzymes` can return a lot of information. When learning, it is easier to start with a small batch of familiar enzymes. ## When to use `RestrictionBatch` versus `Analysis` Use `RestrictionBatch` when you mainly want to search a sequence with several enzymes and inspect the raw results. Use `Analysis` when you want built-in filtering, such as: * enzymes that cut at least once * enzymes that do not cut * enzymes that cut exactly a certain number of times Both are useful, and they work well together. ## Conclusion Biopython’s `Bio.Restriction` module makes restriction enzyme analysis much easier to automate. You can search DNA sequences for cut sites, compare multiple enzymes, filter enzymes by how often they cut, and simulate simple digests directly in Python. This is especially helpful for cloning workflows, plasmid checks, and teaching molecular biology concepts with code. Once you are comfortable with these tools, a good next step is to combine restriction analysis with sequence input from FASTA or GenBank files and build small scripts for plasmid or insert design.