fix(agent): Surface real session-init errors and kill leaked CLIs#2654
Merged
Conversation
Member
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
React Doctor found no issues in the changed files. 🎉 Reviewed by React Doctor for commit |
Contributor
Prompt To Fix All With AIFix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
packages/agent/src/gateway-models.test.ts:112-138
**`fetchModelsList` timeout not covered by tests**
`fetchModelsList` received the same `AbortSignal.timeout` fix as `fetchGatewayModels`, but only `fetchGatewayModels` has a test verifying the timeout degrades to `[]`. A stalled gateway would leave `fetchModelsList` hanging silently in production and the fix would go undetected if it regressed. The two functions are structurally identical, so a single parameterised test covering both would be cleaner and complete coverage.
Reviews (1): Last reviewed commit: "surface real agent init errors and kill ..." | Re-trigger Greptile |
This was referenced Jun 13, 2026
New commits pushed (delta classified non_trivial_delta) — stamphog approval dismissed; re-review running automatically.
There was a problem hiding this comment.
Clean bug-fix with test coverage for all changed paths; the process-leak and error-masking fixes are correct, the gateway fetch timeout is properly bounded, and the resolved bot comment about fetchModelsList coverage is directly addressed by the new parameterized test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Problem
Connecting to an agent could fail with a generic "Internal error" that named no cause, retried every ~30s, and left orphaned
claudesubprocesses behind. The real failure — a 30s session-init timeout inClaudeAcpAgent.createSession— was thrown as a plainError, which the ACP layer collapses into a JSON-RPC-32603 "Internal error"(real text buried indata.details) before it reaches the logs or UI.Changes
RequestError(not a plainError) at the three session-init timeout sites so the real message survives the ACP/tRPC boundary instead of becoming "Internal error"abortController.abort()+query.close()) so theclaudesubprocess can't leak and pile up under the retry loopdata.detailsin the host-side "Failed to create session" log so the exported log names the actual cause for any other masked error/v1/modelsfetches withAbortSignal.timeout(10s)so a stalled gateway can't stall session initHow did you test this?
pnpm --filter @posthog/agent test— 709 passing, including a new refresh-timeout test (asserts aRequestErroris thrown and the timed-out query is closed) and afetchGatewayModelstimeout test@posthog/agentand@posthog/workspace-servertypecheck — cleanbiome linton the 5 changed files — cleanAutomatic notifications