Volcano Plots in Bokeh

Volcano plots compare effect size and statistical significance in the same chart, making them useful for identifying observations that are both large in magnitude and statistically important. Bokeh is a good fit for this kind of visualization because it adds interactivity such as hover inspection, toolbars, and browser-friendly rendering.

This tutorial shows how to build a volcano plot in Bokeh, add significance cutoffs, color important groups, and improve the chart with hover tooltips.

### Basic Volcano Plot

Let's start with a simple volcano plot using simulated data.

import numpy as np
from bokeh.io import output_notebook
from bokeh.plotting import figure, show

np.random.seed(0)
output_notebook()

# Sample data
log_fold_change = np.random.normal(0, 1.2, 200)
p_values = np.random.uniform(0.001, 1.0, 200)
y_values = -np.log10(p_values)

p = figure(
    title="Basic Volcano Plot",
    x_axis_label="Log2 Fold Change",
    y_axis_label="-log10(p-value)",
    width=700,
    height=450,
)

p.scatter(log_fold_change, y_values, size=7, color="black", alpha=0.5)

show(p)

- The x-axis shows the effect size using log fold change.
- The y-axis shows statistical significance as `-log10(p-value)`, so smaller p-values appear higher on the chart.

### Adding Threshold Lines

Threshold lines make it easier to distinguish statistically important points from background noise.

import numpy as np
from bokeh.io import output_notebook
from bokeh.plotting import figure, show
from bokeh.models import Span

np.random.seed(1)
output_notebook()

# Sample data
log_fold_change = np.random.normal(0, 1.1, 250)
p_values = np.random.uniform(0.001, 1.0, 250)
y_values = -np.log10(p_values)

p_cutoff = 0.05
fc_cutoff = 1.0

p = figure(
    title="Volcano Plot with Thresholds",
    x_axis_label="Log2 Fold Change",
    y_axis_label="-log10(p-value)",
    width=700,
    height=450,
)

p.scatter(log_fold_change, y_values, size=7, color="gray", alpha=0.5)

horizontal_cutoff = Span(location=-np.log10(p_cutoff), dimension="width", line_color="red", line_dash="dashed", line_width=2)
left_cutoff = Span(location=-fc_cutoff, dimension="height", line_color="black", line_dash="dashed", line_width=1)
right_cutoff = Span(location=fc_cutoff, dimension="height", line_color="black", line_dash="dashed", line_width=1)

p.add_layout(horizontal_cutoff)
p.add_layout(left_cutoff)
p.add_layout(right_cutoff)

show(p)

- **`Span`** is used to draw horizontal and vertical threshold lines.
- These cutoffs help show which points exceed both significance and effect-size requirements.

### Coloring Significant Groups

Volcano plots are easier to interpret when significant upregulated and downregulated points are visually separated from the background.

import numpy as np
from bokeh.io import output_notebook
from bokeh.plotting import figure, show

np.random.seed(2)
output_notebook()

# Sample data
log_fold_change = np.random.normal(0, 1.3, 300)
p_values = np.random.uniform(0.001, 1.0, 300)
y_values = -np.log10(p_values)

upregulated = (log_fold_change >= 1) & (p_values < 0.05)
downregulated = (log_fold_change <= -1) & (p_values < 0.05)
background = ~(upregulated | downregulated)

p = figure(
    title="Volcano Plot with Highlighted Groups",
    x_axis_label="Log2 Fold Change",
    y_axis_label="-log10(p-value)",
    width=700,
    height=450,
)

p.scatter(log_fold_change[background], y_values[background], size=7, color="lightgray", alpha=0.5, legend_label="Background")
p.scatter(log_fold_change[upregulated], y_values[upregulated], size=8, color="crimson", alpha=0.8, legend_label="Upregulated")
p.scatter(log_fold_change[downregulated], y_values[downregulated], size=8, color="royalblue", alpha=0.8, legend_label="Downregulated")

p.legend.location = "top_left"

show(p)

- The three color groups separate non-significant, positively shifted, and negatively shifted observations.
- This makes the main regions of interest much easier to scan.

### Adding Hover Tooltips

Bokeh becomes especially useful when you want to inspect points interactively.

import numpy as np
from bokeh.models import ColumnDataSource, HoverTool
from bokeh.io import output_notebook
from bokeh.plotting import figure, show

np.random.seed(3)
output_notebook()

# Sample data
labels = [f"Gene {i}" for i in range(1, 101)]
log_fold_change = np.random.normal(0, 1.4, 100)
p_values = np.random.uniform(0.001, 1.0, 100)
y_values = -np.log10(p_values)

source = ColumnDataSource(
    data={
        "label": labels,
        "log_fold_change": log_fold_change,
        "neg_log_p": y_values,
        "p_value": p_values,
    }
)

p = figure(
    title="Interactive Volcano Plot",
    x_axis_label="Log2 Fold Change",
    y_axis_label="-log10(p-value)",
    width=700,
    height=450,
    tools="pan,wheel_zoom,box_zoom,reset,save",
)

p.scatter(
    "log_fold_change",
    "neg_log_p",
    source=source,
    size=8,
    color="darkslategray",
    alpha=0.6,
)

hover = HoverTool(
    tooltips=[
        ("Label", "@label"),
        ("Log2 fold change", "@log_fold_change{0.00}"),
        ("p-value", "@p_value{0.0000}"),
        ("-log10(p)", "@neg_log_p{0.00}"),
    ]
)

p.add_tools(hover)

show(p)

- **`ColumnDataSource`** stores the point data and makes it available to hover tooltips.
- Hover inspection is especially helpful when the chart contains many points.

### Practical Example: Differential Expression Style Volcano Plot

Here is a more analysis-oriented example that combines thresholds, colors, and hover labels in one chart.

import numpy as np
from bokeh.models import ColumnDataSource, HoverTool, Span
from bokeh.io import output_notebook
from bokeh.plotting import figure, show

np.random.seed(4)
output_notebook()

n = 400
labels = [f"Feature {i}" for i in range(1, n + 1)]
log_fold_change = np.random.normal(0, 1.35, n)
p_values = np.random.uniform(0.0001, 1.0, n)
y_values = -np.log10(p_values)

p_cutoff = 0.05
fc_cutoff = 1.0

status = np.where(
    (log_fold_change >= fc_cutoff) & (p_values < p_cutoff),
    "Upregulated",
    np.where(
        (log_fold_change <= -fc_cutoff) & (p_values < p_cutoff),
        "Downregulated",
        "Background",
    ),
)

colors = np.where(
    status == "Upregulated",
    "#D62728",
    np.where(status == "Downregulated", "#1F77B4", "#BDBDBD"),
)

source = ColumnDataSource(
    data={
        "label": labels,
        "log_fold_change": log_fold_change,
        "neg_log_p": y_values,
        "p_value": p_values,
        "status": status,
        "color": colors,
    }
)

p = figure(
    title="Differential Expression Style Volcano Plot",
    x_axis_label="Log2 Fold Change",
    y_axis_label="-log10(p-value)",
    width=780,
    height=480,
    tools="pan,wheel_zoom,box_zoom,reset,save",
)

p.scatter(
    "log_fold_change",
    "neg_log_p",
    source=source,
    size=7,
    color="color",
    alpha=0.7,
)

p.add_layout(Span(location=-np.log10(p_cutoff), dimension="width", line_color="black", line_dash="dashed"))
p.add_layout(Span(location=-fc_cutoff, dimension="height", line_color="black", line_dash="dashed"))
p.add_layout(Span(location=fc_cutoff, dimension="height", line_color="black", line_dash="dashed"))

p.add_tools(
    HoverTool(
        tooltips=[
            ("Label", "@label"),
            ("Status", "@status"),
            ("Log2 fold change", "@log_fold_change{0.00}"),
            ("p-value", "@p_value{0.0000}"),
        ]
    )
)

show(p)

- This version is closer to what you would use in a real analysis workflow.
- Interactive hover details let you inspect specific features without labeling every point on the chart.

### Conclusion

Bokeh is a strong option for volcano plots when interactivity matters. By combining scatter markers, threshold lines, grouped colors, and hover inspection, you can turn a simple volcano plot into a more practical exploratory analysis tool.