Histograms are an essential tool for visualizing the distribution of a dataset. In this tutorial, we will walk through the steps to create and customize histograms using the Plotly library. ### Basic Histogram To get started, let's create a simple histogram from a dataset.
import plotly.express as px import numpy as np # Generate random data np.random.seed(0) data = np.random.randn(1000) # Create a basic histogram fig = px.histogram(data, title='Basic Histogram') # Show the figure fig.show()
- **`px.histogram(data, title='Basic Histogram')`** creates a histogram ### Customizing Histogram Bins You can customize the number of bins and the bin width to better represent your data.
import plotly.express as px import numpy as np # Generate random data np.random.seed(0) data = np.random.randn(1000) # Create a histogram with custom bin width fig = px.histogram(data, nbins=100, title='Histogram with Custom Bins') # Show the figure fig.show()
- **`nbins=100`** specifies the number of bins. ### Customizing Axis Labels and Titles You can add and customize labels and titles to make your plot more informative.
import plotly.express as px import numpy as np # Generate random data np.random.seed(0) data = np.random.randn(500) # Create a histogram with labels and titles fig = px.histogram( data, nbins=30, title='Histogram with Custom Labels', ) fig.update_layout(xaxis_title='Value', yaxis_title='Frequency') # Show the figure fig.show()
- **`title='Histogram with Custom Labels'`** sets the plot title - **`fig.update_layout(xaxis_title='Value', yaxis_title='Frequency')`** sets axis labels ### Creating Overlaid Histograms You can overlay multiple histograms for comparative analysis.
import plotly.graph_objects as go # Generate two sets of random data np.random.seed(0) data1 = np.random.randn(500) data2 = np.random.randn(500) + 2 # Create histograms for both datasets fig = go.Figure() fig.add_trace(go.Histogram(x=data1, nbinsx=30, name='Dataset 1', opacity=0.75)) fig.add_trace(go.Histogram(x=data2, nbinsx=30, name='Dataset 2', opacity=0.75)) # Overlay both histograms by setting the bar mode fig.update_layout( barmode='overlay', title='Overlaid Histograms', xaxis_title='Value', yaxis_title='Frequency' ) # Show the figure fig.show()
- **`go.Figure()`** creates a new figure. - **`go.Histogram(x=data1, nbinsx=30, name='Dataset 1', opacity=0.75)`** defines the first histogram with 30 bins and 75% opacity. - **`barmode='overlay'`** overlays the histograms. ### Creating Stacked Histograms You can also stack multiple histograms.
import plotly.graph_objects as go # Generate two sets of random data np.random.seed(0) data1 = np.random.randn(500) data2 = np.random.randn(500) + 2 # Create histograms for both datasets fig = go.Figure() fig.add_trace(go.Histogram(x=data1, nbinsx=30, name='Dataset 1')) fig.add_trace(go.Histogram(x=data2, nbinsx=30, name='Dataset 2')) # Stack the histograms by setting the bar mode fig.update_layout( barmode='stack', title='Stacked Histograms', xaxis_title='Value', yaxis_title='Frequency' ) # Show the figure fig.show()
- **`barmode='stack'`** stacks the histograms. ### Customizing Histogram Colors You can customize the colors of your histograms for better aesthetics.
import plotly.graph_objects as go # Generate two sets of random data np.random.seed(0) data1 = np.random.randn(500) data2 = np.random.randn(500) + 2 # Create histograms for both datasets with custom colors fig = go.Figure() fig.add_trace(go.Histogram(x=data1, nbinsx=30, name='Dataset 1', marker_color='blue')) fig.add_trace(go.Histogram(x=data2, nbinsx=30, name='Dataset 2', marker_color='orange')) # Overlay both histograms by setting the bar mode fig.update_layout( barmode='overlay', title='Histograms with Custom Colors', xaxis_title='Value', yaxis_title='Frequency' ) # Show the figure fig.show()
- **`marker_color='blue'`** and **`marker_color='orange'`** set custom colors for the histograms. ### Histogram with Density Estimation using `distplot` The `distplot` function from Plotly's `figure_factory` module can be used to create a histogram with a density curve for visualizing the probability density function (PDF).
import plotly.figure_factory as ff import numpy as np # Generate random data np.random.seed(0) data = np.random.randn(1000) # Create a distplot fig = ff.create_distplot([data], group_labels=['Data'], bin_size=0.2, show_hist=True, show_curve=True) # Update layout fig.update_layout( title='Histogram with Density Estimation', xaxis_title='Value', yaxis_title='Density' ) # Show the figure fig.show()
- **`ff.create_distplot([data], group_labels=['Data'], bin_size=0.2, show_hist=True, show_curve=True)`** creates a histogram with a density curve. - **`bin_size=0.2`** defines the bin size. - **`show_hist=True`** shows the histogram. - **`show_curve=True`** shows the PDF curve. ### Conclusion Using Plotly, you can create interactive and customizable histograms to visualize the distribution of your data effectively. Plotly offers convenient functions for overlaying or stacking histograms, customizing colors, or adding density curves.