Reverse complements and transcription are two of the most common sequence transformations in bioinformatics pipelines. They are essential when working with opposite strands, preparing coding regions, or moving from DNA to RNA-level analysis. In this tutorial, you will use Biopython sequence objects for these transformations. ## Reverse Complement of DNA
from Bio.Seq import Seq
# Define a DNA sequence
seq = Seq("ATGCCGTTAACCGT")
# Compute complement and reverse complement
complement = seq.complement()
reverse_complement = seq.reverse_complement()
print("Original:", seq)
print("Complement:", complement)
print("Reverse complement:", reverse_complement)Original: ATGCCGTTAACCGT Complement: TACGGCAATTGGCA Reverse complement: ACGGTTAACGGCAT
This block demonstrates strand transformations directly on a `Seq` object. Reverse complement is particularly important when genes or motifs are located on the opposite DNA strand. ## Transcription (DNA -> RNA)
from Bio.Seq import Seq
# DNA coding-strand example
dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
# Transcribe thymine (T) to uracil (U)
rna = dna.transcribe()
print("DNA:", dna)
print("RNA:", rna)DNA: ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG RNA: AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG
Transcription converts DNA sequence representation into RNA by replacing `T` with `U`. This is a common preprocessing step before translation or RNA-focused analyses. ## Back-Transcription (RNA -> DNA)
from Bio.Seq import Seq
# Start from RNA
rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG")
# Convert RNA back to DNA representation
dna_back = rna.back_transcribe()
print("RNA:", rna)
print("Back-transcribed DNA:", dna_back)RNA: AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG Back-transcribed DNA: ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG
Back-transcription is useful when integrating RNA-derived sequences with DNA-based tools and file formats. It keeps workflows consistent when different software expects different nucleotide alphabets. ## Batch Reverse-Complement Processing for FASTA Files
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
# Create a small example FASTA file
records = [
SeqRecord(seq=Seq("ATGCGTAC"), id="seq1", description=""),
SeqRecord(seq=Seq("TTAACCGG"), id="seq2", description=""),
]
SeqIO.write(records, "input_dna.fasta", "fasta")
# Read, reverse-complement, and write output FASTA
revcomp_records = []
for record in SeqIO.parse("input_dna.fasta", "fasta"):
rc = record.seq.reverse_complement()
revcomp_records.append(SeqRecord(rc, id=record.id + "_rc", description="reverse_complement"))
SeqIO.write(revcomp_records, "output_revcomp.fasta", "fasta")
print("Wrote output_revcomp.fasta")Wrote output_revcomp.fasta
Batch processing is the practical pattern for real datasets with many sequences. It helps standardize strand orientation before downstream alignment or mapping. ## Primer-Oriented Reverse Complement Example
from Bio.Seq import Seq
# Example primer pair for a target region
forward_primer = Seq("AGTCTGACCTGAACTG")
reverse_primer_template = Seq("TCAGGTTGCTAACGTA")
# Reverse primer used in PCR is reverse-complement of template-side sequence
reverse_primer = reverse_primer_template.reverse_complement()
print("Forward primer (5'->3'):", forward_primer)
print("Reverse primer template-side:", reverse_primer_template)
print("Reverse primer (5'->3'):", reverse_primer)Forward primer (5'->3'): AGTCTGACCTGAACTG Reverse primer template-side: TCAGGTTGCTAACGTA Reverse primer (5'->3'): TACGTTAGCAACCTGA
This is a common lab-facing use case: converting a template-side reverse-primer region into the sequence that should be synthesized. ## Strand-Aware Gene Extraction Before Transcription
from Bio.Seq import Seq
# Toy genomic region and gene coordinates
genome = Seq("TTTATGAAACCCGGGTTTAAACCCATGCGTAAAGGG")
start, end = 5, 25
strand = -1 # gene annotated on reverse strand
gene_seq = genome[start:end]
if strand == -1:
gene_seq = gene_seq.reverse_complement()
rna = gene_seq.transcribe()
print("Extracted coding DNA:", gene_seq)
print("Transcribed RNA:", rna)Extracted coding DNA: TGGGTTTAAACCCGGGTTTC Transcribed RNA: UGGGUUUAAACCCGGGUUUC
In annotation pipelines, strand-aware extraction is critical; otherwise, you may transcribe the wrong orientation and get biologically incorrect proteins later. ## Handling Ambiguous Bases During Transformations
from Bio.Seq import Seq
# Sequence with ambiguity codes
ambiguous_dna = Seq("ATGNCGTRYAAT")
print("Original DNA:", ambiguous_dna)
print("Reverse complement:", ambiguous_dna.reverse_complement())
print("RNA transcript:", ambiguous_dna.transcribe())Original DNA: ATGNCGTRYAAT Reverse complement: ATTRYACGNCAT RNA transcript: AUGNCGURYAAU
Ambiguous nucleotide codes appear often in consensus and low-confidence regions. Verifying transformation behavior on these inputs helps avoid subtle bugs in production workflows.