Test using pytest-run-parallel and related fixups in the tests#2194

Draft

seberg wants to merge 17 commits into

NVIDIA:mainfrom

seberg:ft-testing

seberg commented Jun 10, 2026 •

edited

Loading

Description

This is the full follow up to gh-2162 for a full picture. I tried to split commits up roughly and could split them into individual PRs as well.

I am planning to have another look through myself once (see if I can think of a nicer pattern than the current mini plugins).

The buffer closing/sync is an upstream issue I think that I have opened a bug for.

And another reason to split things up and get started: rebased and of course there are new issues :).

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot Bot commented Jun 10, 2026

Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions Bot added cuda.bindings cuda.core cuda.pathfinder labels

seberg changed the title ~~Ft testing~~ Test using pytest-run-parallel and related fixups in the tests

seberg force-pushed the ft-testing branch from 84e10a4 to a9ef88d Compare

June 10, 2026 20:34

leofang assigned seberg

leofang self-requested a review

June 11, 2026 02:55

seberg force-pushed the ft-testing branch from 065330f to ed40f60 Compare

June 11, 2026 13:05

seberg commented Jun 11, 2026

Author

/ok to test ed40f60

github-actions Bot commented Jun 11, 2026

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-2194/
https://nvidia.github.io/cuda-python/pr-preview/pr-2194/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-2194/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-2194/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

seberg added 16 commits

June 11, 2026 16:05


          Fix some threading issues (some free-threading related)

d55cd43

This fixes a few threading issues, but we may want to discuss some
details still.
* The GraphNode cleanup order is an important fix. Another thread may
  end up with the same pointer (but new object) as soon as we clean it
  up.  So we have to remove it from the cache before cleaning it up.
* Use of atomics: I think this is needed, but for this one place
  an atomic seemed more reasonable.  (However, hard to test and if
  it can fail IIUC only on ARM.)
* The critical sections should be pretty safe.  I am not sure they
  will all ensure that the object is always the _identity_ but I am
  pretty sure it protects from worse races.
  (Testing did find this for MemPool.attributes, not others yet.
  Testing with thread-sanitizer might flush out some...)
* The split mutex: This is thread-unsafe.  But I am honestly not
  sure if that isn't just expected, or whether the mutex is good
  but it should also be safe from within CUDA.
* Use of `setdefault` cached pattern is largely just normalizing.  Without
  the `return dict.setdefault` a different instance may be returned on
  different threads (or a cache entry replaced).
  For the `cyGraphMemoryResource` that triggered a test with pytest-run-parallel
  although that doesn't mean it is problematic as such.
  `cuda-pathfinder` uses functools.cache, but usually for strings;
  the one we may want to look at is `load_nvidia_dynamic_lib`.

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          Use C++ bool for atomic

0311c2b

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          Forgot to commit a critical section on this branch

ef66665

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          DEV: run_tests.sh uses pytest-run-parllel and install it on 3.14t and…

c98ca0e

… 3.15t

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          TST: Add mini plugins to push custom fixtures into tests

da81367

E.g. cuda needs to be initialized for each thread, but fixtures
run before pytest-run-parallel launches the threads.
So we create a mini-plugin to deal with this.  We could also solve
this with decorators in many cases, but that would require adding
a lot of decorators...

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          TST: Mark tests as thread-unsafe or limit the number of threads

991d5bc

- thread_unsafe: nvml init ref-count, graphMem attr, mock-based tests,
  OpenGL, peer-access pool state, multiprocessing warning, program-cache
  race reproduction, and functools.cache mutation tests
- parallel_threads_limit: IPC / worker-pool tests that spawn subprocesses
  or open file descriptors (limit 4), example tests (limit 8), and the
  event-registration test whose timeouts are slow

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          TST: use tmp_path fixture in cufile (and mark some as unsafe)

84a63e2

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          TST: Move graph definnitions inline and mark "global" ones as thread-…

…unsafe always

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          TST: Fixup memory tests, mostly work around issue when tearing down m…

58e7f2f

…empool

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          TST: Thread unsafe markers for test_managed_ops

a5988c4

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          Avoid interactive backend when using run_tests.sh locally

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          Use indirect fixtures for a nicer pattern and avoid thread issues

4fe255e

After my first AI try was a crazy mess, the second run actually found
a neat solution...
These objects can be created in the main thread, but we can't create
them on the fly in many threads as it was...

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          Make latch-kernel helper compile only once

831eefb

For some reason the latch kernel helper test started failing now
(it did not before my update from CUDA 13.2 to 13.3?).

The reason isn't that it is not thread-safe, but that something
(presumably module loading/unloading) causes synchronizations which
in turn cause threads having to wait on their LatchKernel to finish.

And of course the test itself really needs that not to happen.
Making sure there is only one LatchKernel compiled and loaded exactly
once seems to avoid this problem.

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          Update pyi files (although I find it strange to include @cython.criti…

f6d9073

…cal_section

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          TST: Remove XFAIL(strict) in cufile on CI (it seems to pass now...)

4e636a0

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>


          Install pytest-run-parallel explicitly in CI

eb6a2ff

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>

seberg force-pushed the ft-testing branch from ed40f60 to eb6a2ff Compare

June 11, 2026 14:05

github-actions Bot added the CI/CD label

seberg commented Jun 11, 2026 •

edited

Loading

Author

Fun, the refactor made the cufile xfail-strict tests pass on CI, but I didn't set up the parallel run correctly... one more try:

/ok to test eb6a2ff

seberg commented Jun 11, 2026

Author

/ok to test eb6a2ff


          Move pytest-run-parallel setup (and hopefully actually make it work)

7b59bff

Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>

seberg force-pushed the ft-testing branch from 42bc9cd to 7b59bff Compare

June 11, 2026 14:56

seberg commented Jun 11, 2026

Author

/ok to test 7b59bff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD cuda.bindings cuda.core cuda.pathfinder