Test using pytest-run-parallel and related fixups in the tests#2194
Draft
seberg wants to merge 17 commits into
Draft
Test using pytest-run-parallel and related fixups in the tests#2194seberg wants to merge 17 commits into
seberg wants to merge 17 commits into
Conversation
Contributor
Author
|
/ok to test ed40f60 |
|
This fixes a few threading issues, but we may want to discuss some details still. * The GraphNode cleanup order is an important fix. Another thread may end up with the same pointer (but new object) as soon as we clean it up. So we have to remove it from the cache before cleaning it up. * Use of atomics: I think this is needed, but for this one place an atomic seemed more reasonable. (However, hard to test and if it can fail IIUC only on ARM.) * The critical sections should be pretty safe. I am not sure they will all ensure that the object is always the _identity_ but I am pretty sure it protects from worse races. (Testing did find this for MemPool.attributes, not others yet. Testing with thread-sanitizer might flush out some...) * The split mutex: This is thread-unsafe. But I am honestly not sure if that isn't just expected, or whether the mutex is good but it should also be safe from within CUDA. * Use of `setdefault` cached pattern is largely just normalizing. Without the `return dict.setdefault` a different instance may be returned on different threads (or a cache entry replaced). For the `cyGraphMemoryResource` that triggered a test with pytest-run-parallel although that doesn't mean it is problematic as such. `cuda-pathfinder` uses functools.cache, but usually for strings; the one we may want to look at is `load_nvidia_dynamic_lib`. Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
… 3.15t Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
E.g. cuda needs to be initialized for each thread, but fixtures run before pytest-run-parallel launches the threads. So we create a mini-plugin to deal with this. We could also solve this with decorators in many cases, but that would require adding a lot of decorators... Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
- thread_unsafe: nvml init ref-count, graphMem attr, mock-based tests, OpenGL, peer-access pool state, multiprocessing warning, program-cache race reproduction, and functools.cache mutation tests - parallel_threads_limit: IPC / worker-pool tests that spawn subprocesses or open file descriptors (limit 4), example tests (limit 8), and the event-registration test whose timeouts are slow Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
…unsafe always Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
…empool Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
After my first AI try was a crazy mess, the second run actually found a neat solution... These objects can be created in the main thread, but we can't create them on the fly in many threads as it was... Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
For some reason the latch kernel helper test started failing now (it did not before my update from CUDA 13.2 to 13.3?). The reason isn't that it is not thread-safe, but that something (presumably module loading/unloading) causes synchronizations which in turn cause threads having to wait on their LatchKernel to finish. And of course the test itself really needs that not to happen. Making sure there is only one LatchKernel compiled and loaded exactly once seems to avoid this problem. Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
…cal_section Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
Author
|
Fun, the refactor made the cufile xfail-strict tests pass on CI, but I didn't set up the parallel run correctly... one more try: /ok to test eb6a2ff |
Author
|
/ok to test eb6a2ff |
Signed-off-by: Sebastian Berg <sebastianb@nvidia.com>
Author
|
/ok to test 7b59bff |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This is the full follow up to gh-2162 for a full picture. I tried to split commits up roughly and could split them into individual PRs as well.
I am planning to have another look through myself once (see if I can think of a nicer pattern than the current mini plugins).
The buffer closing/sync is an upstream issue I think that I have opened a bug for.
And another reason to split things up and get started: rebased and of course there are new issues :).
Checklist