Skip to content

fix(agent): Surface real session-init errors and kill leaked CLIs#2654

Merged
charlesvien merged 2 commits into
mainfrom
fix/agent-session-init-timeout
Jun 13, 2026
Merged

fix(agent): Surface real session-init errors and kill leaked CLIs#2654
charlesvien merged 2 commits into
mainfrom
fix/agent-session-init-timeout

Conversation

@charlesvien

@charlesvien charlesvien commented Jun 13, 2026

Copy link
Copy Markdown
Member

Problem

Connecting to an agent could fail with a generic "Internal error" that named no cause, retried every ~30s, and left orphaned claude subprocesses behind. The real failure — a 30s session-init timeout in ClaudeAcpAgent.createSession — was thrown as a plain Error, which the ACP layer collapses into a JSON-RPC -32603 "Internal error" (real text buried in data.details) before it reaches the logs or UI.

Changes

  1. Throw a descriptive RequestError (not a plain Error) at the three session-init timeout sites so the real message survives the ACP/tRPC boundary instead of becoming "Internal error"
  2. Abort the timed-out query (abortController.abort() + query.close()) so the claude subprocess can't leak and pile up under the retry loop
  3. Surface data.details in the host-side "Failed to create session" log so the exported log names the actual cause for any other masked error
  4. Bound the gateway /v1/models fetches with AbortSignal.timeout(10s) so a stalled gateway can't stall session init

How did you test this?

  • pnpm --filter @posthog/agent test — 709 passing, including a new refresh-timeout test (asserts a RequestError is thrown and the timed-out query is closed) and a fetchGatewayModels timeout test
  • @posthog/agent and @posthog/workspace-server typecheck — clean
  • biome lint on the 5 changed files — clean

Automatic notifications

  • Publish to changelog?
  • Alert Sales and Marketing teams?

charlesvien commented Jun 13, 2026

Copy link
Copy Markdown
Member Author

@github-actions

github-actions Bot commented Jun 13, 2026

Copy link
Copy Markdown

React Doctor found no issues in the changed files. 🎉

Reviewed by React Doctor for commit 17644d6.

@charlesvien charlesvien changed the title surface real agent init errors and kill orphans fix(agent): Surface real session-init errors and kill leaked CLIs Jun 13, 2026
@charlesvien charlesvien marked this pull request as ready for review June 13, 2026 05:12
@greptile-apps

greptile-apps Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
packages/agent/src/gateway-models.test.ts:112-138
**`fetchModelsList` timeout not covered by tests**

`fetchModelsList` received the same `AbortSignal.timeout` fix as `fetchGatewayModels`, but only `fetchGatewayModels` has a test verifying the timeout degrades to `[]`. A stalled gateway would leave `fetchModelsList` hanging silently in production and the fix would go undetected if it regressed. The two functions are structurally identical, so a single parameterised test covering both would be cleaner and complete coverage.

Reviews (1): Last reviewed commit: "surface real agent init errors and kill ..." | Re-trigger Greptile

Comment thread packages/agent/src/gateway-models.test.ts
@charlesvien charlesvien added the Stamphog This will request an autostamp by stamphog on small changes label Jun 13, 2026
github-actions[bot]
github-actions Bot previously approved these changes Jun 13, 2026

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean bug-fix: terminates leaked CLI processes on session-init timeout, surfaces real errors through the ACP layer, and bounds gateway fetch calls. No showstoppers; the bot's missing-test observation is a minor P2 coverage gap, not a blocker.

@github-actions github-actions Bot dismissed their stale review June 13, 2026 06:23

New commits pushed (delta classified non_trivial_delta) — stamphog approval dismissed; re-review running automatically.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean bug-fix with test coverage for all changed paths; the process-leak and error-masking fixes are correct, the gateway fetch timeout is properly bounded, and the resolved bot comment about fetchModelsList coverage is directly addressed by the new parameterized test.

@charlesvien charlesvien merged commit 2b34b5e into main Jun 13, 2026
24 checks passed
@charlesvien charlesvien deleted the fix/agent-session-init-timeout branch June 13, 2026 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Stamphog This will request an autostamp by stamphog on small changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant