Why does STRtree return false negatives for rotated labels?

STRtree indexes axis-aligned bounding boxes. When a label is rotated, its raw geometry bounding box is smaller than the visual footprint. Query the index using the minimum bounding rectangle (MBR) of the rotated polygon, not the rotated geometry itself, to prevent missed collisions.

How do halo radii affect collision bounding boxes?

Text rendering engines add halos that extend beyond raw glyph bounds. Collision logic that ignores halos will pass algorithmic checks but produce visible overlaps in the exported map. Always inflate each bounding box by the maximum halo_radius before inserting it into the spatial index.

Can I use geographic coordinates for bounding box collision checks?

No. Degree-based distances are not uniform — one degree of longitude varies from ~111 km at the equator to zero at the poles. Always reproject to a metric CRS (EPSG:3857 or an appropriate UTM zone) before computing text bounding boxes and running collision queries.

How often should I rebuild the STRtree during batch placement?

Rebuilding after every insertion makes tree construction O(n²) overall. Collect successfully placed bounding boxes and rebuild the STRtree every 500–1 000 placements to amortise the rebalancing cost while keeping query latency low.

Label Collision Avoidance Algorithms

Automated map generation pipelines frequently fail at the final rendering stage: overlapping text, obscured point features, and unreadable dense urban grids. Label collision avoidance solves this by mathematically evaluating candidate positions, detecting spatial conflicts, and resolving overlaps before export. Implementing robust collision logic inside your Python pipeline transforms brittle static layouts into production-ready, scalable workflows that require no manual typographic cleanup.

This topic sits at the engineering core of Programmatic Map Styling and Label Automation, where typography must adapt dynamically to feature density, basemap complexity, and output resolution.

Prerequisites and Environment Configuration

Before implementing collision avoidance logic, ensure your environment meets these technical requirements:

Python 3.10+ with geopandas>=0.14, shapely>=2.0, matplotlib>=3.8, and pandas>=2.0
Metric CRS consistency: All input layers must be projected to a metric coordinate reference system (EPSG:3857 or an appropriate UTM zone). Degree-based bounding boxes distort at mid-latitudes and render collision math meaningless. Validate with gdf.crs.is_projected.
Font metric awareness: Text dimensions vary by typeface, weight, and DPI. Approximate rendered bounding boxes using matplotlib.textpath.TextPath or PIL.ImageFont.getbbox. The resulting width and height in data units are your raw collision geometry before padding.
Shapely STRtree: Available directly from shapely.strtree.STRtree (Shapely 2.x). Prefer this over the older GeoDataFrame.sindex, which wraps PyGEOS with a slightly higher per-query overhead.
No API keys required: All tooling in this guide is open-source. Data sources can be Natural Earth shapefiles, OSM extracts via osmnx, or any vector format readable by geopandas.read_file.

Conceptual Foundation

At its heart, label placement is a constrained optimization problem: given n features each with k candidate positions, find an assignment that maximises typographic quality (proximity to anchor, alignment with feature) subject to the constraint that no two placed label bounding boxes intersect.

The problem is NP-hard in the general case, so production systems use greedy heuristics with deterministic ordering. The dominant heuristic: sort features by priority descending, then for each feature try candidates in quality order and lock the first non-colliding position. This approximation is fast (linear in features, sub-linear per query via spatial indexing) and produces reproducible results, which matters for CI/CD pipelines.

STRtree (Sort-Tile-Recursive tree) is the spatial index of choice. It partitions bounding boxes along a space-filling curve that minimises index height, delivering O(log n) worst-case query time for query() calls. Because the tree is rebuilt when new labels are placed, batching insertions (rather than rebuilding after every placement) keeps amortised construction cost at O(n log n).

Two assumptions must hold for the greedy algorithm to give correct results:

Bounding boxes are inflated to include halo radius, stroke width, and padding — not just the raw glyph extent.
Sort order is fully deterministic: when priority weights tie, break ties by a stable secondary key (e.g., feature_id) so outputs are reproducible across Python interpreter restarts.

Step-by-Step Implementation

Step 1 — Candidate Generation

For each feature, generate a ranked list of candidate label positions. The specific anchors depend on geometry type:

Points: 8 cardinal/intercardinal offsets from the point, ordered by cartographic convention (right > upper-right > lower-right > left …).
Lines: midpoint or quarter-point anchors, with orientation parallel to the local tangent. Horizontal variants act as fallbacks.
Polygons: centroid first; if the centroid falls outside the polygon (concave shapes), substitute the Polylabel result or the centroid of the largest interior rectangle.

Each candidate must be converted to an axis-aligned bounding box that accounts for font metrics, padding, and halo:

from shapely.geometry import box

def candidate_bbox(
    anchor_x: float,
    anchor_y: float,
    text_width: float,
    text_height: float,
    halo_radius: float = 1.5,
    padding: float = 2.0,
) -> "shapely.Polygon":
    half_w = (text_width / 2) + halo_radius + padding
    half_h = (text_height / 2) + halo_radius + padding
    return box(
        anchor_x - half_w, anchor_y - half_h,
        anchor_x + half_w, anchor_y + half_h,
    )

Clip candidate anchors to the viewport extent before building bounding boxes; labels that start off-canvas will never be visible and waste index capacity.

Step 2 — Priority Assignment

Rank features using a numeric weight that maps to cartographic importance. Store the weight as a column on your GeoDataFrame so it travels with the geometry through all subsequent steps:

PRIORITY_MAP = {
    "capital":    10,
    "city":        8,
    "town":        6,
    "village":     4,
    "hamlet":      2,
    "hydrography": 2,
    "poi":         1,
}

gdf["label_priority"] = gdf["feature_type"].map(PRIORITY_MAP).fillna(1).astype(int)
# Deterministic secondary sort within tied priorities
gdf = gdf.sort_values(
    ["label_priority", "feature_id"], ascending=[False, True]
).reset_index(drop=True)

This weighting integrates naturally with rule-based styling engines, where the same attribute-driven conditions that control symbol colour and size also control collision tolerance. For example, a rule that upsizes capitals to 14 pt should simultaneously raise their label_priority to 10.

Step 3 — STRtree Index Construction

Build the initial index over any existing drawn features (basemap geometry bounding boxes, borders, water masks) that labels must avoid:

from shapely.strtree import STRtree

# pre_placed: list of Shapely geometries already committed to canvas
index_geoms: list = list(pre_placed)
tree = STRtree(index_geoms)

Expand basemap bounding boxes by the same halo radius used in Step 1 before inserting them — collision detection must be symmetric.

Step 4 — Iterative Placement and Collision Detection

Process features in the priority order established in Step 2. For each feature, try candidates in quality order:

placed_boxes: list = list(pre_placed)  # starts with basemap obstacles

for _, row in gdf.iterrows():
    candidates = generate_candidates(row)  # returns list of Shapely box geometries
    tree = STRtree(placed_boxes)          # rebuild (or batch-rebuild; see §Performance)

    placed = False
    for cand_box in candidates:
        hits = tree.query(cand_box)
        collides = any(placed_boxes[i].intersects(cand_box) for i in hits)
        if not collides:
            placed_boxes.append(cand_box)
            row["placed_bbox"] = cand_box
            row["label_placed"] = True
            placed = True
            break

    if not placed:
        apply_fallback(row)  # see Step 5

Use intersects() rather than a proximity threshold — strict geometric intersection ensures pixel-perfect accuracy. For rotated labels, query using the minimum bounding rectangle (MBR) of the rotated polygon, not the rotated geometry itself, to avoid false negatives (see FAQ).

This placement loop is also where solving label overlap in dense urban maps with Python applies the most pressure: street grids and POI clusters create localised collision hotspots that exhaust all candidates for low-priority features in a single tile.

Step 5 — Conflict Resolution and Fallback Strategies

When all candidates for a feature collide, trigger a tiered resolution routine rather than silently dropping the label:

Anchor shift: Rotate candidate offsets 90° or 180° and re-query. Effective when the default right-side anchor is blocked by a road label but the top anchor is free.
Leader line injection: Move the label to a clear zone adjacent to the viewport and draw a thin polyline (linewidth=0.4) connecting it to the original anchor. Validate that leader lines do not cross other features — add their path geometries to the spatial index as obstacles.
Dynamic scaling: Reduce font size within defined limits (e.g., down to 75 % of nominal) and recompute bounding boxes. Effective for high-priority features like capitals where omission is unacceptable.
Suppression with logging: If no valid position exists after all fallbacks, suppress the label and log the feature_id, label_priority, bounding box of the winning obstacle, and candidate count. This log becomes the QA artefact for manual review.

Fallback logic must be deterministic. Randomised placement algorithms (simulated annealing without a fixed seed, genetic algorithms) introduce non-reproducible outputs, which breaks CI/CD pipelines and makes cartographic QA impossible across runs.

Step 6 — Production Validation

After placement completes, run automated checks before committing the render:

from itertools import combinations

def validate_placement(placed_boxes: list) -> list[tuple]:
    """Return list of (i, j) index pairs that overlap."""
    violations = []
    tree = STRtree(placed_boxes)
    for i, geom in enumerate(placed_boxes):
        hits = tree.query(geom)
        for j in hits:
            if j <= i:
                continue
            if geom.intersects(placed_boxes[j]):
                violations.append((i, j))
    return violations

violations = validate_placement(placed_boxes)
assert len(violations) == 0, f"{len(violations)} label overlaps detected post-placement"

Also verify:

Priority compliance: No lower-priority label displaced a higher-priority one. Cross-check the suppression log against label_priority.
Viewport coverage: All placed bounding boxes lie within the export extent. Use viewport_polygon.contains(bbox) for each placed label.
Legend consistency: Synchronise typography rules with dynamic legend generation to ensure legend entries reflect only the labels that successfully rendered. Suppressed labels must not appear in the legend.

Complete Working Code Example

"""
label_placer.py — deterministic label collision avoidance for GeoDataFrame features.
Requires: shapely>=2.0, geopandas>=0.14, matplotlib>=3.8
"""
from __future__ import annotations

from dataclasses import dataclass, field
from typing import Iterable

import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
from matplotlib.textpath import TextPath
from shapely.geometry import Point, box
from shapely.strtree import STRtree


PRIORITY_MAP: dict[str, int] = {
    "capital": 10, "city": 8, "town": 6,
    "village": 4, "hamlet": 2, "poi": 1,
}
OFFSETS_PT = [  # (dx_frac, dy_frac) relative to label half-dimensions
    (1.2, 0.0), (1.2, 1.0), (0.0, 1.2), (-1.2, 1.0),
    (-1.2, 0.0), (-1.2, -1.0), (0.0, -1.2), (1.2, -1.0),
]


@dataclass
class PlacedLabel:
    feature_id: str
    bbox: object  # Shapely Polygon
    anchor_x: float
    anchor_y: float
    text: str
    priority: int
    suppressed: bool = False
    fallback_used: str = ""


def _font_metrics(text: str, fontsize: float, dpi: float = 96.0) -> tuple[float, float]:
    """Return (width_pts, height_pts) for text at fontsize in points."""
    tp = TextPath((0, 0), text, size=fontsize)
    bb = tp.get_extents()
    scale = dpi / 72.0  # points → screen units; for map units caller must divide by map_dpi
    return bb.width * scale, bb.height * scale


def _candidate_boxes(
    cx: float, cy: float, tw: float, th: float,
    halo: float = 1.5, pad: float = 2.0,
) -> list:
    hw = tw / 2 + halo + pad
    hh = th / 2 + halo + pad
    return [
        box(cx + dx * hw - hw, cy + dy * hh - hh,
            cx + dx * hw + hw, cy + dy * hh + hh)
        for dx, dy in OFFSETS_PT
    ]


def place_labels(
    gdf: gpd.GeoDataFrame,
    viewport_geom,
    feature_type_col: str = "feature_type",
    label_col: str = "name",
    fontsize: float = 8.0,
    halo_radius: float = 1.5,
    rebuild_interval: int = 500,
) -> list[PlacedLabel]:
    """
    Place labels for all features in gdf, returning a list of PlacedLabel records.
    gdf must be in a metric CRS.
    """
    assert gdf.crs is not None and gdf.crs.is_projected, (
        "gdf must use a projected (metric) CRS — reproject with gdf.to_crs(epsg=3857)"
    )

    gdf = gdf.copy()
    gdf["_priority"] = gdf[feature_type_col].map(PRIORITY_MAP).fillna(1).astype(int)
    gdf = gdf.sort_values(["_priority", label_col], ascending=[False, True]).reset_index(drop=True)

    placed_boxes: list = []
    placed_labels: list[PlacedLabel] = []
    suppression_log: list[dict] = []

    for idx, row in gdf.iterrows():
        geom: Point = row.geometry.centroid if row.geometry.geom_type != "Point" else row.geometry
        cx, cy = geom.x, geom.y
        text = str(row.get(label_col, ""))
        priority = int(row["_priority"])
        tw, th = _font_metrics(text, fontsize)

        candidates = _candidate_boxes(cx, cy, tw, th, halo=halo_radius)
        # filter off-viewport candidates early
        candidates = [c for c in candidates if viewport_geom.contains(c)]

        # batch STRtree rebuild
        if idx % rebuild_interval == 0 or len(placed_boxes) == 0:
            tree = STRtree(placed_boxes) if placed_boxes else STRtree([])

        chosen: object | None = None
        for cand in candidates:
            hits = tree.query(cand)
            if not any(placed_boxes[i].intersects(cand) for i in hits):
                chosen = cand
                break

        if chosen is None:
            # Fallback 1: try reduced font (75 %)
            tw2, th2 = tw * 0.75, th * 0.75
            for cand in _candidate_boxes(cx, cy, tw2, th2, halo=halo_radius):
                if not viewport_geom.contains(cand):
                    continue
                hits = tree.query(cand)
                if not any(placed_boxes[i].intersects(cand) for i in hits):
                    chosen = cand
                    break

        suppressed = chosen is None
        if not suppressed:
            placed_boxes.append(chosen)
            # invalidate tree so next batch rebuild picks it up
        else:
            suppression_log.append({"feature_id": row.get("feature_id", idx), "priority": priority})

        placed_labels.append(PlacedLabel(
            feature_id=str(row.get("feature_id", idx)),
            bbox=chosen,
            anchor_x=cx, anchor_y=cy,
            text=text, priority=priority,
            suppressed=suppressed,
            fallback_used="scale_75" if (not suppressed and chosen != candidates[0] if candidates else False) else "",
        ))

    if suppression_log:
        print(f"[label_placer] Suppressed {len(suppression_log)} labels. "
              f"Top suppressed priority: {max(r['priority'] for r in suppression_log)}")

    return placed_labels


def draw_placed_labels(ax: plt.Axes, placed: list[PlacedLabel]) -> None:
    for lbl in placed:
        if lbl.suppressed or lbl.bbox is None:
            continue
        minx, miny, maxx, maxy = lbl.bbox.bounds
        cx = (minx + maxx) / 2
        cy = (miny + maxy) / 2
        ax.text(cx, cy, lbl.text, ha="center", va="center",
                fontsize=8, color="black",
                bbox=dict(boxstyle="round,pad=0.1", fc="white", ec="none", alpha=0.7))

Performance Optimization Patterns

Brute-force O(n²) collision detection becomes impractical beyond ~3,000 features. Apply these techniques to maintain sub-second placement times:

Batch STRtree rebuilds (O(n log n) amortised): Rebuilding the tree after every insertion makes overall construction O(n²). Rebuild every rebuild_interval placements (500–1 000 is a practical default). Between rebuilds, candidates pass through the most recently built tree — a small risk of stale queries that disappears at the next rebuild boundary.
Viewport pre-filtering (O(1) per candidate): Discard candidates whose bounding boxes fall outside the export extent before querying the index. This typically eliminates 30–60 % of candidates for edge features in tiled pipelines.

Font metric caching (O(1) per text string): TextPath construction is expensive. Memoize results by (text, fontsize, weight):

from functools import lru_cache

@lru_cache(maxsize=4096)
def cached_font_metrics(text: str, fontsize: float) -> tuple[float, float]:
    tp = TextPath((0, 0), text, size=fontsize)
    bb = tp.get_extents()
    return bb.width, bb.height

Spatial partitioning for large datasets: For datasets exceeding ~50,000 features, partition the GeoDataFrame into overlapping spatial tiles (e.g., 256 × 256 map-unit chunks with a 20-unit overlap buffer). Process each tile independently via concurrent.futures.ProcessPoolExecutor, then merge results while checking cross-boundary collisions in the overlap zones.

Common Pitfalls and Debugging

CRS mismatch: Mixing geographic (lon/lat) and projected coordinates causes bounding box calculations to distort catastrophically — a city label that is 0.001° wide is actually ~100 m, not a consistent pixel count. Always assert gdf.crs.is_projected before index construction.

Halo radius ignorance: The most common cause of visual overlaps on exported maps even when the algorithm reports zero collisions. Text rendering engines apply halos that extend beyond raw glyph bounds. Inflate every bounding box by halo_radius (both width and height) before inserting into the index.

STRtree false negatives for rotated labels: STRtree indexes axis-aligned bounding boxes. A rotated label’s AABB is larger than its actual footprint — this causes false positives (harmless, leads to tighter placement) but not false negatives. The real issue is the reverse: if you query with a rotated geometry instead of its MBR, Shapely clips the query to the geometry’s own AABB, potentially missing a collider. Always query with cand.envelope when the candidate geometry is non-rectangular.

Non-deterministic sort: Python’s sort() is stable, but if label_priority weights are equal for many features, insertion order varies between runs when the input GeoDataFrame arrives in a different order (common with database queries). Always supply a stable secondary sort key — feature_id or name — to guarantee reproducible outputs across CI runs.

Floating-point precision at tile boundaries: Features near tile edges can produce bounding boxes that straddle the boundary by sub-pixel amounts, causing a label to appear in two tiles’ placement logs but render in neither. Apply a small negative inset (e.g., viewport_geom.buffer(-0.5)) to the viewport polygon during candidate filtering to create a clean exclusion margin.

Conclusion

Label collision avoidance is a foundational requirement for any automated cartographic pipeline, not an optional post-processing step. By combining deterministic candidate generation, STRtree spatial indexing, priority-ordered placement, and tiered fallback strategies, Python-based pipelines can produce publication-ready maps at scale without manual typographic intervention. The complete code example above is production-ready for point feature labelling; extend _candidate_boxes with tangent-aligned candidates for line labelling or use shapely.polylabel for polygon interior placement. As feature density increases and multi-scale tile generation becomes standard, robust collision logic directly reduces QA overhead and accelerates delivery timelines.

Solving Label Overlap in Dense Urban Maps with Python — Deep-dive on managing POI clusters and street-grid hotspots at high zoom levels.
Rule-Based Styling Engines — Attribute-driven conditions that set both symbol appearance and label priority weights simultaneously.
Dynamic Legend Generation — Keep legend entries in sync with labels that actually rendered after collision resolution.
Typography Rules for Maps — Cartographic conventions for typeface selection, hierarchy, and spacing that inform candidate ranking.

Back to Programmatic Map Styling and Label Automation