Skip to content

Fix cost_distance Dijkstra heap overflow on non-uniform friction#3370

Merged
brendancol merged 2 commits into
xarray-contrib:mainfrom
brendancol:deep-sweep-accuracy-cost_distance-2026-06-16
Jun 17, 2026
Merged

Fix cost_distance Dijkstra heap overflow on non-uniform friction#3370
brendancol merged 2 commits into
xarray-contrib:mainfrom
brendancol:deep-sweep-accuracy-cost_distance-2026-06-16

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Fixes #3369.

Problem

The numba Dijkstra kernels in cost_distance sized their binary min-heap at height * width. A lazy-deletion min-heap enqueues a pixel every time its tentative cost improves, so on grids with non-uniform friction the push count exceeds height * width and _heap_push writes past the end of the h_keys/h_rows/h_cols arrays. That out-of-bounds write corrupts numba-managed memory.

In practice the iterative dask path aborts the interpreter (corrupted size vs prev_size / SIGABRT, exit 134) on adversarial friction. The numpy single-tile path does not crash on small grids because the overflow lands on an adjacent allocation, but it is still undefined behavior.

The CuPy relaxation kernel is a parallel Bellman-Ford and does not use this heap, so the GPU path was unaffected.

Fix

Size the heap to the real upper bound on pushes:

  • _cost_distance_kernel: height * width * (n_neighbors + 1) (directed edges plus one seed per pixel).
  • _cost_distance_tile_kernel: the same, plus 2 * (width + height) + 4 for the phase-2 boundary seeds.

The worst observed push count is about 1.9x height * width, well inside n_neighbors * height * width.

Tests

Two regression tests compare the numpy and iterative-dask paths against a reference heapq Dijkstra (unbounded heap, so it cannot overflow) over many random grids with strongly varying friction and uneven chunks. Before the fix the iterative test aborts the process; after it, all 88 tests in test_cost_distance.py pass.

Verified on a CUDA host: numpy, cupy, dask+numpy, and dask+cupy all agree across 30-40 random adversarial grids with barriers (0 / NaN / Inf friction), multiple sources, both connectivities, and finite/infinite max_cost.

…ib#3369)

The numba Dijkstra kernels sized their binary min-heap at height*width.
A lazy-deletion min-heap pushes a pixel every time its tentative cost
improves, so the push count exceeds height*width on grids with varying
friction, and _heap_push wrote past the end of the heap arrays. That is
an out-of-bounds write into numba-managed memory, which corrupts the heap
and aborts the process (SIGABRT) on the iterative dask path and is
undefined behavior on the numpy path.

Size the heap to the real bound: directed edges plus one seed per pixel,
height*width*(n_neighbors+1). The tile kernel adds boundary-seed headroom
on top. Add regression tests that compare both the numpy and iterative
dask paths against a reference heapq Dijkstra over many adversarial grids.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 16, 2026
@brendancol brendancol merged commit 525eacc into xarray-contrib:main Jun 17, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cost_distance: Dijkstra heap undersized at height*width causes out-of-bounds writes

1 participant