Tutorials

Sparse Matrices with SciPy

Most large matrices that arise in practice — finite-difference grids, graph adjacency matrices, recommendation systems, natural language term-document matrices — are sparse: nearly all entries are zero. Storing them as dense NumPy arrays wastes enormous amounts of memory and slows down computation. `scipy.sparse` stores only the non-zero values and their positions, making it practical to work with matrices of millions of rows. As a bonus, sparse linear solvers like `spsolve` exploit the sparsity structure to solve systems far faster than dense methods.

### Creating Sparse Matrices

The most convenient way to build a sparse matrix from scratch is `diags` for diagonal/banded structures, or `csr_matrix` to convert from a dense array. CSR (Compressed Sparse Row) is the standard format for arithmetic and linear solves.

import scipy.sparse as sp
import numpy as np

# Convert a small dense matrix to CSR
dense = np.array([[4, 0, 0, 2],
                  [0, 3, 0, 0],
                  [1, 0, 5, 0],
                  [0, 0, 0, 6]], dtype=float)
A = sp.csr_matrix(dense)
print(A)
print(f"\nShape: {A.shape},  Non-zeros: {A.nnz},  Density: {A.nnz / A.shape[0]**2:.2f}")
<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 6 stored elements and shape (4, 4)>
  Coords	Values
  (0, 0)	4.0
  (0, 3)	2.0
  (1, 1)	3.0
  (2, 0)	1.0
  (2, 2)	5.0
  (3, 3)	6.0

Shape: (4, 4),  Non-zeros: 6,  Density: 0.38
- `sp.csr_matrix(dense)` scans the dense array and stores only the 6 non-zero values — the rest are implicit zeros.
- `A.nnz` is the number of non-zero entries. Density = nnz / (rows × cols); values close to 0 mean the matrix is very sparse.
- Printing a CSR matrix shows `(row, col)  value` tuples — the internal storage representation.

### Building Large Sparse Matrices Efficiently

For large structured matrices — like the finite-difference Laplacian — `diags` is the right tool. It builds a sparse matrix from diagonals without ever allocating a dense array.

import scipy.sparse as sp
import numpy as np
import matplotlib.pyplot as plt

n = 50
# 1D finite-difference second-derivative matrix: [-1, 2, -1] tridiagonal
A = sp.diags([-np.ones(n-1), 2*np.ones(n), -np.ones(n-1)],
             offsets=[-1, 0, 1], format="csr")

print(f"Shape: {A.shape},  Non-zeros: {A.nnz},  Density: {A.nnz/n**2:.5f}")

plt.figure(figsize=(5, 5))
plt.spy(A, markersize=3, color="steelblue")
plt.title(f"Sparsity Pattern ({n}×{n} tridiagonal)")
plt.tight_layout()
plt.show()
Shape: (50, 50),  Non-zeros: 148,  Density: 0.05920
- `sp.diags([...], offsets=[...])` places each array on the specified diagonal: 0 is the main diagonal, −1 is one below, +1 is one above.
- `plt.spy(A)` visualizes the non-zero pattern — each dot represents a stored value. The banded structure is immediately visible.
- For n=50, the matrix has 50² = 2500 possible entries but only 148 non-zeros — less than 6% density. At n=10000, a dense matrix would need 800 MB; the sparse version needs under 1 MB.

### Sparse Matrix Arithmetic

Sparse matrices support the same arithmetic operators as NumPy arrays: `+`, `*` (element-wise or matrix multiply with `@`), and `.T` for transpose. Operations preserve sparsity where possible.

import scipy.sparse as sp
import numpy as np

n = 6
A = sp.diags([-np.ones(n-1), 2*np.ones(n), -np.ones(n-1)],
             offsets=[-1, 0, 1], format="csr")

x = np.ones(n)
b = A @ x  # sparse matrix-vector product

print("A @ ones =", b)

# Multiply two sparse matrices
B = A @ A
print(f"A²  non-zeros: {B.nnz}  (more fill-in than A with {A.nnz})")
A @ ones = [1. 0. 0. 0. 0. 1.]
A²  non-zeros: 24  (more fill-in than A with 16)
- `A @ x` works the same as for NumPy arrays but exploits sparsity — only non-zero entries are touched, so it's O(nnz) instead of O(n²).
- Multiplying two sparse matrices (`A @ A`) can introduce fill-in: entries that were zero in both operands become non-zero in the product. The result may be denser than the inputs.
- For the 1D Laplacian `A`, `A @ ones` gives `[1, 0, 0, ..., 0, 1]` — the boundary residuals of a finite-difference discretization.

### Solving a Sparse Linear System

`spsolve` solves Ax = b for sparse A using a direct sparse LU factorization. It's dramatically faster than `numpy.linalg.solve` for large sparse systems because it exploits the zero structure.

import scipy.sparse as sp
import scipy.sparse.linalg as spla
import numpy as np
import matplotlib.pyplot as plt

# Solve the 1D Poisson equation: -u'' = f, u(0)=u(1)=0
n = 200
h = 1.0 / (n + 1)
x = np.linspace(h, 1 - h, n)

A = sp.diags([-np.ones(n-1), 2*np.ones(n), -np.ones(n-1)],
             offsets=[-1, 0, 1], format="csr") / h**2

f = np.sin(np.pi * x)           # right-hand side
u = spla.spsolve(A, f)          # numerical solution
u_exact = np.sin(np.pi * x) / np.pi**2  # exact: u = sin(πx)/π²

plt.figure(figsize=(8, 4))
plt.plot(x, u_exact, color="steelblue", linewidth=2.5, label="Exact")
plt.plot(x, u, color="crimson", linewidth=1.5, linestyle="--", label="spsolve")
plt.title("Sparse Linear Solve: 1D Poisson Equation")
plt.xlabel("x")
plt.ylabel("u(x)")
plt.legend()
plt.tight_layout()
plt.show()

print(f"Max error: {np.max(np.abs(u - u_exact)):.2e}")
Max error: 2.06e-06
- `spla.spsolve(A, f)` solves the 200×200 tridiagonal system in milliseconds; the same call on a dense matrix would be slower by orders of magnitude at larger n.
- The right-hand side `f = sin(πx)` has the exact solution `u = sin(πx)/π²` — a clean benchmark to verify accuracy.
- `max_error` is O(h²), confirming the expected second-order accuracy of the finite-difference scheme.

Sparse matrices are essential whenever the problem size exceeds what dense methods can handle. For dense linear algebra — decompositions, eigenvalues, matrix functions — see [linear algebra with scipy.linalg](/tutorials/linear-algebra-with-scipy-linalg).