Translation is how nucleotide sequence becomes biologically interpretable protein sequence. In practice, you often need to inspect multiple reading frames and identify candidate open reading frames (ORFs), especially when annotation is incomplete. In this tutorial, you will translate DNA in different frames and detect ORFs with Biopython. ## Working with Reading Frames
from Bio.Seq import Seq
# Example DNA sequence
sequence = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
# Translate frame 1 (starting at position 0)
protein_frame_1 = sequence.translate(to_stop=False)
# Translate frame 2 (starting at position 1)
protein_frame_2 = sequence[1:].translate(to_stop=False)
# Translate frame 3 (starting at position 2)
protein_frame_3 = sequence[2:].translate(to_stop=False)
print("Frame +1:", protein_frame_1)
print("Frame +2:", protein_frame_2)
print("Frame +3:", protein_frame_3)Frame +1: MAIVMGR*KGAR* Frame +2: WPL*WAAERVPD Frame +3: GHCNGPLKGCPI
/Users/yogesh/projects/pyfiddle/.venv/lib/python3.12/site-packages/Bio/Seq.py:2877: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future. warnings.warn(
This block shows how frame offset changes the codon grouping and protein output. Frame-aware translation is essential because biologically valid proteins usually appear in only a subset of possible frames. ## Finding Open Reading Frames (ORFs) in DNA
from Bio.Seq import Seq
# DNA sequence containing multiple potential starts/stops
dna = Seq("AAATGAAATAGATGCCCTAAATGGGGTTTGA")
# Scan one frame for start codon (ATG) and stop codons
stops = {"TAA", "TAG", "TGA"}
orf_results = []
for i in range(0, len(dna) - 2):
codon = str(dna[i:i+3])
if codon == "ATG":
for j in range(i + 3, len(dna) - 2, 3):
stop_codon = str(dna[j:j+3])
if stop_codon in stops:
orf_seq = dna[i:j+3]
protein = orf_seq.translate(to_stop=True)
orf_results.append((i, j + 3, str(orf_seq), str(protein)))
break
print("ORFs found:", len(orf_results))
for start, end, orf_seq, protein in orf_results:
print(f"Start={start}, End={end}, ORF={orf_seq}, Protein={protein}")ORFs found: 2 Start=2, End=11, ORF=ATGAAATAG, Protein=MK Start=11, End=20, ORF=ATGCCCTAA, Protein=MP
This example searches for start and in-frame stop codons, then translates each ORF candidate. ORF detection helps you move from raw DNA segments to candidate coding regions for annotation. ## Translating All Reading Frames
from Bio.Seq import Seq
# Input DNA sequence
dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
# Positive-strand frames (+1, +2, +3)
forward_frames = [
dna[i:].translate(to_stop=False)
for i in range(3)
]
# Reverse-complement strand frames (-1, -2, -3)
reverse = dna.reverse_complement()
reverse_frames = [
reverse[i:].translate(to_stop=False)
for i in range(3)
]
for idx, protein in enumerate(forward_frames, start=1):
print(f"Frame +{idx}: {protein}")
for idx, protein in enumerate(reverse_frames, start=1):
print(f"Frame -{idx}: {protein}")Frame +1: MAIVMGR*KGAR* Frame +2: WPL*WAAERVPD Frame +3: GHCNGPLKGCPI Frame -1: LSGTLSAAHYNGH Frame -2: YRAPFQRPITMA Frame -3: IGHPFSGPLQWP
Translating all six frames gives a complete view of possible coding interpretations from both strands. This is especially useful in exploratory analysis or when gene orientation is unknown. ## Longest ORF Across All Six Frames
from Bio.Seq import Seq
dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
stops = {"*"}
def longest_orf_protein(seq):
longest = ""
for frame in range(3):
protein = str(seq[frame:].translate(to_stop=False))
current = ""
for aa in protein:
if aa in stops:
if len(current) > len(longest):
longest = current
current = ""
else:
current += aa
if len(current) > len(longest):
longest = current
return longest
forward_longest = longest_orf_protein(dna)
reverse_longest = longest_orf_protein(dna.reverse_complement())
best = max(forward_longest, reverse_longest, key=len)
print("Longest ORF protein candidate:", best)
print("Length:", len(best))Longest ORF protein candidate: LSGTLSAAHYNGH Length: 13
Taking the longest ORF across six frames is a practical heuristic for candidate coding sequence discovery in unannotated fragments. ## Applying a Minimum ORF Length Filter
from Bio.Seq import Seq
dna = Seq("AAATGAAATAGATGCCCTAAATGGGGTTTGA")
min_aa_length = 4
stops = {"TAA", "TAG", "TGA"}
filtered_orfs = []
for i in range(0, len(dna) - 2):
if str(dna[i:i+3]) != "ATG":
continue
for j in range(i + 3, len(dna) - 2, 3):
codon = str(dna[j:j+3])
if codon in stops:
orf_seq = dna[i:j+3]
protein = str(orf_seq.translate(to_stop=True))
if len(protein) >= min_aa_length:
filtered_orfs.append((i, j + 3, protein))
break
print("Filtered ORFs (>= min length):", filtered_orfs)Filtered ORFs (>= min length): []
Length filtering reduces short, likely spurious ORFs and helps you prioritize biologically plausible candidates for annotation. ## Translating with an Alternative Genetic Code
from Bio.Seq import Seq
# Example DNA where table choice can change interpretation
dna = Seq("ATGATAAAGAATAG")
# Standard code (table 1)
protein_standard = dna.translate(table=1, to_stop=False)
# Vertebrate mitochondrial code (table 2)
protein_mito = dna.translate(table=2, to_stop=False)
print("Standard table translation:", protein_standard)
print("Mitochondrial table translation:", protein_mito)Standard table translation: MIKN Mitochondrial table translation: MMKN
Alternative code tables are essential in mitochondrial and non-standard organisms where codon meanings differ from the canonical code. ## Frame-Level Translation Report
from Bio.Seq import Seq
dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
reverse = dna.reverse_complement()
report = []
for strand_label, seq in [("+", dna), ("-", reverse)]:
for frame in range(3):
protein = str(seq[frame:].translate(to_stop=False))
fragments = [frag for frag in protein.split("*") if frag]
longest_fragment = max((len(f) for f in fragments), default=0)
report.append(
{
"strand": strand_label,
"frame": frame + 1,
"protein_length": len(protein),
"longest_orf_length": longest_fragment,
"stop_count": protein.count("*"),
}
)
for row in report:
print(row){'strand': '+', 'frame': 1, 'protein_length': 13, 'longest_orf_length': 7, 'stop_count': 2}
{'strand': '+', 'frame': 2, 'protein_length': 12, 'longest_orf_length': 8, 'stop_count': 1}
{'strand': '+', 'frame': 3, 'protein_length': 12, 'longest_orf_length': 12, 'stop_count': 0}
{'strand': '-', 'frame': 1, 'protein_length': 13, 'longest_orf_length': 13, 'stop_count': 0}
{'strand': '-', 'frame': 2, 'protein_length': 12, 'longest_orf_length': 12, 'stop_count': 0}
{'strand': '-', 'frame': 3, 'protein_length': 12, 'longest_orf_length': 12, 'stop_count': 0}
A frame summary report helps you compare coding potential systematically across all frames instead of inspecting each translation manually.