Content Exclusion Service fails closed (blocks all shell commands) after auth/token refresh — use-after-dispose (v1.0.61)

[Copilot speaking]

## Affected version
GitHub Copilot CLI **v1.0.61** (`pkg/win32-x64/1.0.61/app.js`). Identified by reverse‑engineering the shipped bundle and correlating multiple session logs.

## Summary
After a credential/token update during a session, the `ContentExclusionService` is disposed but a **dangling, unretained reference** to it remains in the cached shell tool's config. The disposed service **fails closed**, so every subsequent shell command in that session is blocked as "targeting an excluded path" — even when the org has **zero** exclusion rules, and even for non‑file tokens like `git`, `Set-Location`, or `C:\Windows\System32\cmd.exe`. The session never recovers.

## Observable symptoms
- Logs show `[ContentExclusionService] Loaded 0 exclusion rule(s)` (no rules configured), yet later:
  `[ContentExclusion] Blocked shell command targeting excluded path: <path>`
- The blocked "paths" are nonsensical — they're the first token of the command resolved against the cwd (e.g. `<cwd>\git`, `<cwd>\Set-Location`), including interpreters like `C:\Windows\System32\cmd.exe`.
- Blocking begins **mid‑session** and persists for the rest of the session (no recovery).
- Strong temporal correlation with auth instability / re‑auth (`/login`) prompts.

## Root cause (precise chain, with bundle anchors)

1. **Service is single‑owner, refcount‑based.** `kat` initializes `retainCount=1`; `dispose()` does `retainCount--; if(retainCount<=0) disposed=true`. The **only** `retain()` call in the bundle is inside `setContentExclusionService(e,r)` (`…&& !r && e.retain()`). The lazy creator calls `setContentExclusionService(_hn(...), /*skipRetain*/ true)`, so a freshly‑created service stays at **refcount 1** — the session manager's field is its sole owner.

2. **The shell tool captures the service without retaining it.** `getToolConfig()` builds the tool config with `contentExclusionService: this.contentExclusionService` — a plain **value snapshot, no `retain()`**. The shell tool (`Umt`) stores it as `this.config` and is **cached for the session** as `this.shellContext` (only rebuilt on an `enableScriptSafety` change, *not* on auth change).

3. **An auth change disposes the service out‑of‑band.** `updateOptions(e)` (the config‑delta handler) disposes + nulls the service whenever any of these change:
   - `authInfo`: `this.authInfo !== e.authInfo && this.setContentExclusionService(void 0, false)` — a **reference** comparison, so *any* new authInfo object (token refresh, `/login`, session reattach) triggers it.
   - the `CONTENT_EXCLUSION` feature flag toggles.
   - `additionalContentExclusionPolicies` changes.

   `updateOptions({authInfo})` is invoked externally by the **`setCredentials` RPC**: `setCredentials(e){ return t.updateOptions({authInfo:e.credentials}), {success:true} }`. This can fire **at any time, including mid‑turn**. Because the shell tool never retained the service, `setContentExclusionService(void 0,…)` drops refcount 1 → 0 → `disposed=true` **while the shell tool still references it**.

4. **The use site has no liveness guard.** `Umt.executeShellToolCallback` does:
   ```js
   let a = this.config.contentExclusionService;          // dangling, disposed instance
   if (a && s.possiblePaths.length > 0) {
     let u = await a.findFirstExcluded(possiblePaths…);   // no isUnavailable() check, no re-create
     if (u) /* log "Blocked shell command…"; block */
   }
   ```
   No `isUnavailable()` check, no re‑creation. The only re‑create guard lives in the per‑turn manager path and rebuilds **the manager's own field**, never the cached shell tool's snapshot.

5. **Disposed → fail closed.** On a disposed instance:
   ```js
   async findFirstExcluded(e){
     if (this.isUnavailable()) {
       let n = this.getUnavailableResult();  // {excluded:true, reason:"Content exclusion rules could not be fetched…"}
       return n.excluded && e.length>0 ? { path: e[0], result: n } : null;  // ← first token, unconditionally
     } … }
   ```
   Hence every command is blocked and the reported "excluded path" is just `possiblePaths[0]`.

**Net:** disposing the sole owner's reference on auth change, with (a) an unretained capture in the cached shell tool and (b) no liveness check at the use site, yields a use‑after‑dispose that fails closed for the remainder of the session.

## Why it correlates with re‑authentication
The disposal trigger is `this.authInfo !== e.authInfo` (reference inequality). Routine token refresh, `/login`, and session reattach all push a new `authInfo` object via `setCredentials`, disposing the service. Frequent re‑auth prompts and the spurious blocking are **two symptoms of the same credential churn** — causally linked, not coincidental.

## Evidence (anonymized)
Across the session logs reviewed, blocking appeared only in a small number of sessions; all the rest never blocked. The blocking sessions share a common signal:

| Session | Blocks | Token 401s | Service re‑inits |
|---|---:|---:|---:|
| A | 8 | many | 2 |
| B | 6 | 1 | 3 |
| C | 4 | 0 | 4 (two within ~1 s) |

- **No** non‑blocking session had any token 401s; the auth‑unstable sessions (A, B) were the ones that blocked.
- The blocking session with **no** 401 (C) instead showed the **highest service churn** (rapid re‑inits, two within ~1 s) — consistent with a non‑auth config update (flag/policy) or an `authInfo` object swap landing mid‑turn.
- In every blocking session there were **zero** service re‑inits after the first block → it never recovers in‑session.
- Representative timing: a token 401, then a service re‑init shortly after, then blocking begins ~1–2 minutes later and continues until the turn/session ends.

## Suggested fixes (any one breaks the chain; ideally all three)
1. **Retain on capture / use a live accessor.** Have the tool config `retain()` the service, or read it via a live getter (`getContentExclusionService()`) instead of a value snapshot, so the instance can't be disposed while in use.
2. **Guard the use site.** In `executeShellToolCallback`, treat `isUnavailable()` like "no service" (skip the check or lazily re‑create), rather than calling `findFirstExcluded` on a possibly‑disposed instance.
3. **Don't dispose on benign authInfo churn.** Compare auth identity by value (token/identity), not object reference, before disposing — and/or recreate the service atomically instead of disposing then waiting for the next turn.

Bonus: when failing closed, don't report `possiblePaths[0]` as an "excluded path" if it isn't an existing absolute file path — the current behavior is misleading and blocks shell builtins/interpreters.

## Workaround for users
Restart the CLI session when blocking begins — a fresh process re‑creates the service cleanly. Updating to a newer CLI build (if available) is advisable.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content Exclusion Service fails closed (blocks all shell commands) after auth/token refresh — use-after-dispose (v1.0.61) #3757

Affected version

Summary

Observable symptoms

Root cause (precise chain, with bundle anchors)

Why it correlates with re‑authentication

Evidence (anonymized)

Suggested fixes (any one breaks the chain; ideally all three)

Workaround for users

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Content Exclusion Service fails closed (blocks all shell commands) after auth/token refresh — use-after-dispose (v1.0.61) #3757

Description

Affected version

Summary

Observable symptoms

Root cause (precise chain, with bundle anchors)

Why it correlates with re‑authentication

Evidence (anonymized)

Suggested fixes (any one breaks the chain; ideally all three)

Workaround for users

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions