Nucleotide sequences (DNA or RNA) are central to many bioinformatics workflows. The **Biopython** library provides powerful tools for creating, manipulating, and analyzing biological sequences in Python. In this tutorial, you’ll learn how to: - Create nucleotide sequences - Compute complements and reverse complements - Transcribe DNA to RNA - Translate DNA into protein sequences - Perform simple sequence analysis Biopython represents sequences using the `Seq` object from the `Bio.Seq` module. --- ## Creating a DNA Sequence The `Seq` class represents a biological sequence and behaves similarly to a Python string, but with extra biological functionality.
from Bio.Seq import Seq
# Create a DNA sequence
dna_seq = Seq("ATGCGTACGTTAGC")
# Print the sequence
print("DNA sequence:", dna_seq)
# Get sequence length
print("Length:", len(dna_seq))
# Access specific nucleotide
print("First nucleotide:", dna_seq[0])
# Slice the sequence
print("First five nucleotides:", dna_seq[:5])DNA sequence: ATGCGTACGTTAGC Length: 14 First nucleotide: A First five nucleotides: ATGCG
**Explanation**
* **`Seq("ATGCGTACGTTAGC")`** creates a Biopython sequence object containing DNA nucleotides.
* **`len(dna_seq)`** returns the sequence length.
* **Indexing (`dna_seq[0]`)** accesses a single nucleotide.
* **Slicing (`dna_seq[:5]`)** extracts a portion of the sequence, just like with Python strings.
---
## Complement and Reverse Complement
DNA strands are complementary. Biopython provides built-in methods to compute complements.
from Bio.Seq import Seq
# Define DNA sequence
dna_seq = Seq("ATGCGTACGTTAGC")
# Complement
complement = dna_seq.complement()
# Reverse complement
reverse_complement = dna_seq.reverse_complement()
print("Original:", dna_seq)
print("Complement:", complement)
print("Reverse complement:", reverse_complement)Original: ATGCGTACGTTAGC Complement: TACGCATGCAATCG Reverse complement: GCTAACGTACGCAT
**Explanation** * **`complement()`** replaces each nucleotide with its pair: * A ↔ T * C ↔ G * **`reverse_complement()`** first reverses the sequence and then computes the complement. * Reverse complements are commonly used when analyzing the opposite DNA strand. --- ## Transcribing DNA to RNA Transcription converts DNA into RNA by replacing thymine (`T`) with uracil (`U`).
from Bio.Seq import Seq
# DNA sequence
dna_seq = Seq("ATGCGTACGTTAGC")
# Transcribe DNA to RNA
rna_seq = dna_seq.transcribe()
print("DNA:", dna_seq)
print("RNA:", rna_seq)DNA: ATGCGTACGTTAGC RNA: AUGCGUACGUUAGC
**Explanation** * **`transcribe()`** converts DNA to RNA. * Thymine (`T`) becomes uracil (`U`). * This mimics the biological transcription process that occurs in cells. --- ## Translating DNA into Protein Biopython can translate nucleotide sequences into amino acid sequences.
from Bio.Seq import Seq
# DNA sequence with a start codon
dna_seq = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
# Translate DNA into protein
protein = dna_seq.translate()
print("DNA:", dna_seq)
print("Protein:", protein)DNA: ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG Protein: MAIVMGR*KGAR*
**Explanation** * **`translate()`** converts DNA into amino acids using the genetic code. * Translation reads nucleotides in groups of **three (codons)**. * Each codon corresponds to one amino acid in the resulting protein sequence. --- ## Counting Nucleotides You can easily analyze nucleotide composition using standard Python methods.
from Bio.Seq import Seq
# DNA sequence
dna_seq = Seq("ATGCGTACGTTAGC")
# Convert to string for counting
seq_str = str(dna_seq)
# Count nucleotides
a_count = seq_str.count("A")
t_count = seq_str.count("T")
g_count = seq_str.count("G")
c_count = seq_str.count("C")
print("A:", a_count)
print("T:", t_count)
print("G:", g_count)
print("C:", c_count)A: 3 T: 4 G: 4 C: 3
**Explanation** * **`str(dna_seq)`** converts the Biopython sequence object to a normal Python string. * The **`count()`** method counts occurrences of each nucleotide. * This is useful for computing sequence composition or GC content. --- ## Calculating GC Content GC content measures the proportion of guanine (`G`) and cytosine (`C`) bases in a sequence.
from Bio.Seq import Seq
# DNA sequence
dna_seq = Seq("ATGCGTACGTTAGC")
# Convert to string
seq_str = str(dna_seq)
# Calculate GC content
gc_count = seq_str.count("G") + seq_str.count("C")
gc_content = gc_count / len(seq_str) * 100
print("GC content:", gc_content)GC content: 50.0
**Explanation**
* **`seq_str.count("G") + seq_str.count("C")`** counts GC bases.
* The value is divided by the total sequence length.
* Multiplying by **100** converts it into a percentage.
---
## Reading Sequences from a FASTA File
Biopython’s `SeqIO` module allows you to read sequences from common bioinformatics file formats.
from Bio import SeqIO
import requests
# Let's first download an example FASTA file to work with
url = "https://raw.githubusercontent.com/omgenomics/bio-data-zoo/refs/heads/main/data/fasta/good/basic_dna.fa"
response = requests.get(url)
with open("example.fasta", "w") as f:
f.write(response.text)
# Parse sequences from a FASTA file
for record in SeqIO.parse("example.fasta", "fasta"):
print("ID:", record.id)
print("Sequence:", record.seq)
print("Length:", len(record.seq))ID: sequence1 Sequence: AATTCTCATTACTGTATCACAGCAAGTTGTATTTACAACAAAAATCCAAA Length: 50 ID: sequence2 Sequence: GCCTACCAGAAAACGTTGTATTTTGGCAAAGTTCAAAAAGTCAGTCCAGA Length: 50 ID: sequence3 Sequence: GTATAATTCACAGAGTTTCATGTGGTTGTTGTTGACTCTACATATTGTCT Length: 50
**Explanation** * **`SeqIO.parse()`** reads sequences from a file. * `"example.fasta"` is the input file. * `"fasta"` specifies the file format. * Each **`record`** contains metadata (`id`) and the biological sequence (`seq`). --- # Conclusion Biopython provides a powerful toolkit for working with nucleotide sequences in Python. With just a few lines of code, you can: * Create DNA and RNA sequences * Compute complements and reverse complements * Transcribe and translate sequences * Perform basic sequence analysis * Read biological data from FASTA files These capabilities form the foundation of many **bioinformatics workflows**, from genome analysis to protein prediction.