Tutorials

Spatial Algorithms with SciPy

Spatial algorithms answer geometric questions about sets of points: which points are closest to a query location, how far apart are all pairs of points, how do you partition space into regions of influence, and how do you triangulate an irregular point cloud? These problems appear in machine learning (k-nearest-neighbor classification), computational geometry, geographic analysis, mesh generation, and clustering. `scipy.spatial` provides efficient implementations of all these operations.

### Nearest Neighbor Search with KDTree

A KD-tree is a data structure that organizes points in k-dimensional space for fast nearest-neighbor lookups. Building the tree once and then querying it many times is far faster than brute-force search, especially for large datasets.

from scipy.spatial import KDTree
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(0)
points = rng.random((300, 2))
query = np.array([[0.5, 0.5]])

tree = KDTree(points)
distances, indices = tree.query(query, k=5)

print("5 nearest neighbors:")
print("  Indices:  ", indices[0])
print("  Distances:", distances[0].round(4))

plt.figure(figsize=(6, 6))
plt.scatter(points[:, 0], points[:, 1], s=10, color="steelblue", alpha=0.4, label="All points")
plt.scatter(*query[0], s=120, color="black", zorder=5, label="Query (0.5, 0.5)")
plt.scatter(points[indices[0], 0], points[indices[0], 1],
            s=80, color="crimson", zorder=4, label="5 nearest")
for i in indices[0]:
    plt.plot([query[0, 0], points[i, 0]], [query[0, 1], points[i, 1]],
             color="crimson", linewidth=0.8, alpha=0.6)
plt.title("KDTree: 5 Nearest Neighbors")
plt.legend()
plt.tight_layout()
plt.show()
5 nearest neighbors:
  Indices:   [223 160 268  37 146]
  Distances: [0.0276 0.0663 0.0688 0.0748 0.0753]
- `KDTree(points)` builds the tree from an (N, 2) array of 2D coordinates; it generalizes to any number of dimensions.
- `tree.query(query, k=5)` returns the distances and indices of the 5 closest points to the query location; pass a 2D array to query multiple points at once.
- The lines from the query to its neighbors make it easy to visually verify the result.

### Pairwise Distance Matrices with cdist

`cdist` computes the distance between every point in one set and every point in another, returning a matrix of shape (m, n). You can specify any metric: Euclidean, Manhattan, cosine, and more.

from scipy.spatial.distance import cdist
import numpy as np

cities_A = np.array([[0, 0], [1, 0], [0, 1]], dtype=float)   # group A
cities_B = np.array([[2, 2], [3, 0], [-1, 2]], dtype=float)  # group B

D = cdist(cities_A, cities_B, metric="euclidean")
print("Distance matrix (A vs B):")
print(D.round(3))
print(f"\nClosest pair: A[{D.argmin()//D.shape[1]}] to B[{D.argmin()%D.shape[1]}], dist={D.min():.3f}")
Distance matrix (A vs B):
[[2.828 3.    2.236]
 [2.236 2.    2.828]
 [2.236 3.162 1.414]]

Closest pair: A[2] to B[2], dist=1.414
- `cdist(A, B)` returns a matrix where `D[i, j]` is the distance from `A[i]` to `B[j]`.
- For within-group distances, use `cdist(A, A)` or `scipy.spatial.distance.pdist(A)` which only computes the upper triangle.
- Switching `metric="cosine"` would compute cosine distances instead — useful for comparing high-dimensional text or image vectors.

### Voronoi Diagram

A Voronoi diagram partitions the plane into regions: each region contains all points closer to its seed point than to any other. Voronoi diagrams arise in competitive facility location, cell modeling in biology, and mesh generation.

from scipy.spatial import Voronoi, voronoi_plot_2d
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(42)
points = rng.random((18, 2))

vor = Voronoi(points)

fig, ax = plt.subplots(figsize=(6, 6))
voronoi_plot_2d(vor, ax=ax, show_vertices=False,
                line_colors="steelblue", line_width=1.5, point_size=6)
ax.set_xlim(0, 1)
ax.set_ylim(0, 1)
ax.set_title("Voronoi Diagram")
ax.set_xlabel("x")
ax.set_ylabel("y")
plt.tight_layout()
plt.show()

print(f"Number of Voronoi regions: {len(vor.regions)}")
Number of Voronoi regions: 19
- `Voronoi(points)` computes the diagram; `voronoi_plot_2d` handles the plotting including infinite ridges extending to the boundary.
- `vor.ridge_points` lists the pairs of seed points that share a boundary (a ridge), which is useful for constructing the Delaunay triangulation.
- The diagram is clipped at the axis limits — ridges extending to infinity are truncated at the plot boundary.

### Delaunay Triangulation

Delaunay triangulation connects a set of points into non-overlapping triangles such that no point lies inside the circumcircle of any triangle. It's the standard way to create a mesh from an irregular point cloud for finite-element analysis or surface interpolation.

from scipy.spatial import Delaunay
import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(7)
points = rng.random((25, 2))
tri = Delaunay(points)

plt.figure(figsize=(6, 6))
plt.triplot(points[:, 0], points[:, 1], tri.simplices,
            color="steelblue", linewidth=1)
plt.plot(points[:, 0], points[:, 1], "o", color="crimson", markersize=6)
plt.title(f"Delaunay Triangulation ({len(tri.simplices)} triangles)")
plt.xlabel("x")
plt.ylabel("y")
plt.tight_layout()
plt.show()

print(f"Points: {len(points)},  Triangles: {len(tri.simplices)}")
Points: 25,  Triangles: 41
- `Delaunay(points)` returns an object whose `simplices` attribute is an (n_triangles, 3) array of point indices making up each triangle.
- `plt.triplot` draws the triangulation directly from the point coordinates and simplex indices.
- The Voronoi diagram is the geometric dual of the Delaunay triangulation: every Voronoi edge is perpendicular to the corresponding Delaunay edge.

`scipy.spatial` covers the most common computational geometry tasks without needing a specialized library. For the next step — analyzing graphs and networks built from spatial data — see [NetworkX graph analysis](/tutorials/networkx).