Fix cost_distance Dijkstra heap overflow on non-uniform friction#3370
Merged
brendancol merged 2 commits intoJun 17, 2026
Merged
Conversation
…ib#3369) The numba Dijkstra kernels sized their binary min-heap at height*width. A lazy-deletion min-heap pushes a pixel every time its tentative cost improves, so the push count exceeds height*width on grids with varying friction, and _heap_push wrote past the end of the heap arrays. That is an out-of-bounds write into numba-managed memory, which corrupts the heap and aborts the process (SIGABRT) on the iterative dask path and is undefined behavior on the numpy path. Size the heap to the real bound: directed edges plus one seed per pixel, height*width*(n_neighbors+1). The tile kernel adds boundary-seed headroom on top. Add regression tests that compare both the numpy and iterative dask paths against a reference heapq Dijkstra over many adversarial grids.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #3369.
Problem
The numba Dijkstra kernels in
cost_distancesized their binary min-heap atheight * width. A lazy-deletion min-heap enqueues a pixel every time its tentative cost improves, so on grids with non-uniform friction the push count exceedsheight * widthand_heap_pushwrites past the end of theh_keys/h_rows/h_colsarrays. That out-of-bounds write corrupts numba-managed memory.In practice the iterative dask path aborts the interpreter (
corrupted size vs prev_size/ SIGABRT, exit 134) on adversarial friction. The numpy single-tile path does not crash on small grids because the overflow lands on an adjacent allocation, but it is still undefined behavior.The CuPy relaxation kernel is a parallel Bellman-Ford and does not use this heap, so the GPU path was unaffected.
Fix
Size the heap to the real upper bound on pushes:
_cost_distance_kernel:height * width * (n_neighbors + 1)(directed edges plus one seed per pixel)._cost_distance_tile_kernel: the same, plus2 * (width + height) + 4for the phase-2 boundary seeds.The worst observed push count is about 1.9x
height * width, well insiden_neighbors * height * width.Tests
Two regression tests compare the numpy and iterative-dask paths against a reference
heapqDijkstra (unbounded heap, so it cannot overflow) over many random grids with strongly varying friction and uneven chunks. Before the fix the iterative test aborts the process; after it, all 88 tests intest_cost_distance.pypass.Verified on a CUDA host: numpy, cupy, dask+numpy, and dask+cupy all agree across 30-40 random adversarial grids with barriers (0 / NaN / Inf friction), multiple sources, both connectivities, and finite/infinite
max_cost.