Skip to content

test: isolate install-family tests from process-global env bleed#283

Merged
ScriptedAlchemy merged 16 commits into
masterfrom
fix/test-isolation-robustness
Jul 5, 2026
Merged

test: isolate install-family tests from process-global env bleed#283
ScriptedAlchemy merged 16 commits into
masterfrom
fix/test-isolation-robustness

Conversation

@ScriptedAlchemy

Copy link
Copy Markdown
Owner

Fixes a pre-existing test flake (install/uninstall tests race on the memory_digest_targets.json atomic rename under in-process cargo test).

Root cause

The install path resolves the profile root via TRACEDECAY_DATA_DIR when set, falling back to <home>/.tracedecay only when unset (profile_root_for_agent_home, skill_targets.rs:74). Under cargo test (one process, shared env), a sibling test that pins the env makes concurrent unpinned installs resolve to the shared path and race writing <shared>/agent_managed/memory_digest_targets.json — the atomic sibling-rename colliding with a peer's dir removal. Reproduced at ~36% in-process; nextest (CI harness) is immune because it runs each test in its own process, which is why CI never caught it.

Fix (test-only, zero product change)

Adds an AgentEnvLock RAII helper (locks the process-global PROCESS_ENV_LOCK and pins TRACEDECAY_DATA_DIR to the test's own home — the exact value the unset fallback produces, so no assertions change) and applies it to the install/uninstall tests that lacked isolation. Verified: full workspace green + heavy in-process stress stable.

Note: the task also mentioned a git_watch debounce flake, but git_watch.rs doesn't exist on master (it's in PR #280's auto-sync work, already green there) — out of scope for this master-based fix.

Recovery

Session 2c51d204-3565-4a10-833d-d8fbd51620c3 · workflow wf_bae0c1a3-0b3 · fact 41 (CI-gate lesson)

🤖 Generated with Claude Code

The install path resolves profile root via TRACEDECAY_DATA_DIR when set,
falling back to <home>/.tracedecay only when unset. Under in-process
cargo test, a sibling test that pins the env made concurrent unpinned
installs resolve to the shared path and race the memory_digest_targets
atomic rename. Adds an AgentEnvLock RAII helper (locks PROCESS_ENV_LOCK +
pins the env to the test's own home — the exact unset-fallback value, so
no assertions change) and applies it to the install/uninstall tests that
lacked isolation. Test-only; verified full workspace green + stress-stable.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@changeset-bot

changeset-bot Bot commented Jul 4, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: b807292

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

home is &Path (dir.path()) at these sites, so pin(&home) is a needless
borrow (clippy, blocking policy); PathBuf sites keep the borrow.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
ScriptedAlchemy and others added 14 commits July 4, 2026 19:56
Migrate the two call sites from the local install_env_lock() +
pinned_profile_storage() pair to crate::common::AgentEnvLock::pin (the
canonical bundle this PR introduced), and delete the now-duplicate local
helpers and their now-unused imports. Same lock+pin behavior.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Extract lock_recovering_poison/lock_global_db_env for poison-tolerant
GLOBAL_DB_ENV_LOCK acquires, migrate Hermes update-plugin tests to
AgentEnvLock, and drop the one-off hermes_env_guard helper.
Parallel lib unit tests that store MCP response handles under the
process-global TRACEDECAY_DATA_DIR profile tree raced on profile-sharded
response-handles/ directories, causing intermittent store failures.
Parallel lib tests used separate mutexes for TRACEDECAY_DATA_DIR pins,
letting hook analytics and profile resolution tests race and flake.
Add lock_test_env alias, Claude install profile pinning, and route
remaining config/daemon/handler env overrides through the shared lock
with poison recovery.
Handle-store lib tests must serialize with TRACEDECAY_DATA_DIR mutators;
otherwise profile resolution races and truncation handles fail to persist.
TraceDecay init/index tests shared TRACEDECAY_DATA_DIR with parallel env
mutators; pin an isolated profile under the shared test lock.
@ScriptedAlchemy ScriptedAlchemy merged commit 38375e2 into master Jul 5, 2026
16 checks passed
This was referenced Jul 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant