Volcano plots compare effect size and statistical significance in the same chart, making them useful for identifying observations that are both large in magnitude and statistically important. Bokeh is a good fit for this kind of visualization because it adds interactivity such as hover inspection, toolbars, and browser-friendly rendering. This tutorial shows how to build a volcano plot in Bokeh, add significance cutoffs, color important groups, and improve the chart with hover tooltips. ### Basic Volcano Plot Let's start with a simple volcano plot using simulated data.
import numpy as np
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
np.random.seed(0)
output_notebook()
# Sample data
log_fold_change = np.random.normal(0, 1.2, 200)
p_values = np.random.uniform(0.001, 1.0, 200)
y_values = -np.log10(p_values)
p = figure(
title="Basic Volcano Plot",
x_axis_label="Log2 Fold Change",
y_axis_label="-log10(p-value)",
width=700,
height=450,
)
p.scatter(log_fold_change, y_values, size=7, color="black", alpha=0.5)
show(p)- The x-axis shows the effect size using log fold change. - The y-axis shows statistical significance as `-log10(p-value)`, so smaller p-values appear higher on the chart. ### Adding Threshold Lines Threshold lines make it easier to distinguish statistically important points from background noise.
import numpy as np
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
from bokeh.models import Span
np.random.seed(1)
output_notebook()
# Sample data
log_fold_change = np.random.normal(0, 1.1, 250)
p_values = np.random.uniform(0.001, 1.0, 250)
y_values = -np.log10(p_values)
p_cutoff = 0.05
fc_cutoff = 1.0
p = figure(
title="Volcano Plot with Thresholds",
x_axis_label="Log2 Fold Change",
y_axis_label="-log10(p-value)",
width=700,
height=450,
)
p.scatter(log_fold_change, y_values, size=7, color="gray", alpha=0.5)
horizontal_cutoff = Span(location=-np.log10(p_cutoff), dimension="width", line_color="red", line_dash="dashed", line_width=2)
left_cutoff = Span(location=-fc_cutoff, dimension="height", line_color="black", line_dash="dashed", line_width=1)
right_cutoff = Span(location=fc_cutoff, dimension="height", line_color="black", line_dash="dashed", line_width=1)
p.add_layout(horizontal_cutoff)
p.add_layout(left_cutoff)
p.add_layout(right_cutoff)
show(p)- **`Span`** is used to draw horizontal and vertical threshold lines. - These cutoffs help show which points exceed both significance and effect-size requirements. ### Coloring Significant Groups Volcano plots are easier to interpret when significant upregulated and downregulated points are visually separated from the background.
import numpy as np
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
np.random.seed(2)
output_notebook()
# Sample data
log_fold_change = np.random.normal(0, 1.3, 300)
p_values = np.random.uniform(0.001, 1.0, 300)
y_values = -np.log10(p_values)
upregulated = (log_fold_change >= 1) & (p_values < 0.05)
downregulated = (log_fold_change <= -1) & (p_values < 0.05)
background = ~(upregulated | downregulated)
p = figure(
title="Volcano Plot with Highlighted Groups",
x_axis_label="Log2 Fold Change",
y_axis_label="-log10(p-value)",
width=700,
height=450,
)
p.scatter(log_fold_change[background], y_values[background], size=7, color="lightgray", alpha=0.5, legend_label="Background")
p.scatter(log_fold_change[upregulated], y_values[upregulated], size=8, color="crimson", alpha=0.8, legend_label="Upregulated")
p.scatter(log_fold_change[downregulated], y_values[downregulated], size=8, color="royalblue", alpha=0.8, legend_label="Downregulated")
p.legend.location = "top_left"
show(p)- The three color groups separate non-significant, positively shifted, and negatively shifted observations. - This makes the main regions of interest much easier to scan. ### Adding Hover Tooltips Bokeh becomes especially useful when you want to inspect points interactively.
import numpy as np
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
np.random.seed(3)
output_notebook()
# Sample data
labels = [f"Gene {i}" for i in range(1, 101)]
log_fold_change = np.random.normal(0, 1.4, 100)
p_values = np.random.uniform(0.001, 1.0, 100)
y_values = -np.log10(p_values)
source = ColumnDataSource(
data={
"label": labels,
"log_fold_change": log_fold_change,
"neg_log_p": y_values,
"p_value": p_values,
}
)
p = figure(
title="Interactive Volcano Plot",
x_axis_label="Log2 Fold Change",
y_axis_label="-log10(p-value)",
width=700,
height=450,
tools="pan,wheel_zoom,box_zoom,reset,save",
)
p.scatter(
"log_fold_change",
"neg_log_p",
source=source,
size=8,
color="darkslategray",
alpha=0.6,
)
hover = HoverTool(
tooltips=[
("Label", "@label"),
("Log2 fold change", "@log_fold_change{0.00}"),
("p-value", "@p_value{0.0000}"),
("-log10(p)", "@neg_log_p{0.00}"),
]
)
p.add_tools(hover)
show(p)- **`ColumnDataSource`** stores the point data and makes it available to hover tooltips. - Hover inspection is especially helpful when the chart contains many points. ### Practical Example: Differential Expression Style Volcano Plot Here is a more analysis-oriented example that combines thresholds, colors, and hover labels in one chart.
import numpy as np
from bokeh.models import ColumnDataSource, HoverTool, Span
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
np.random.seed(4)
output_notebook()
n = 400
labels = [f"Feature {i}" for i in range(1, n + 1)]
log_fold_change = np.random.normal(0, 1.35, n)
p_values = np.random.uniform(0.0001, 1.0, n)
y_values = -np.log10(p_values)
p_cutoff = 0.05
fc_cutoff = 1.0
status = np.where(
(log_fold_change >= fc_cutoff) & (p_values < p_cutoff),
"Upregulated",
np.where(
(log_fold_change <= -fc_cutoff) & (p_values < p_cutoff),
"Downregulated",
"Background",
),
)
colors = np.where(
status == "Upregulated",
"#D62728",
np.where(status == "Downregulated", "#1F77B4", "#BDBDBD"),
)
source = ColumnDataSource(
data={
"label": labels,
"log_fold_change": log_fold_change,
"neg_log_p": y_values,
"p_value": p_values,
"status": status,
"color": colors,
}
)
p = figure(
title="Differential Expression Style Volcano Plot",
x_axis_label="Log2 Fold Change",
y_axis_label="-log10(p-value)",
width=780,
height=480,
tools="pan,wheel_zoom,box_zoom,reset,save",
)
p.scatter(
"log_fold_change",
"neg_log_p",
source=source,
size=7,
color="color",
alpha=0.7,
)
p.add_layout(Span(location=-np.log10(p_cutoff), dimension="width", line_color="black", line_dash="dashed"))
p.add_layout(Span(location=-fc_cutoff, dimension="height", line_color="black", line_dash="dashed"))
p.add_layout(Span(location=fc_cutoff, dimension="height", line_color="black", line_dash="dashed"))
p.add_tools(
HoverTool(
tooltips=[
("Label", "@label"),
("Status", "@status"),
("Log2 fold change", "@log_fold_change{0.00}"),
("p-value", "@p_value{0.0000}"),
]
)
)
show(p)- This version is closer to what you would use in a real analysis workflow. - Interactive hover details let you inspect specific features without labeling every point on the chart. ### Conclusion Bokeh is a strong option for volcano plots when interactivity matters. By combining scatter markers, threshold lines, grouped colors, and hover inspection, you can turn a simple volcano plot into a more practical exploratory analysis tool.