Skip to content

fix(ocap-kernel): restore IO channels for persisted subclusters#963

Merged
FUDCo merged 2 commits into
mainfrom
chip/restore-subcluster-io-channels
Jun 26, 2026
Merged

fix(ocap-kernel): restore IO channels for persisted subclusters#963
FUDCo merged 2 commits into
mainfrom
chip/restore-subcluster-io-channels

Conversation

@FUDCo

@FUDCo FUDCo commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adds SubclusterManager.restorePersistedIOChannels() and calls it from Kernel.#init before initializeAllVats.
  • Walks every persisted subcluster, finds those whose config declares io, and re-creates the channels via the IOManager.
  • Per-subcluster failures are logged but do not abort the broader init.

Why

Subclusters that declare config.io (Unix sockets, named pipes, etc.) had their channels created inside launchSubcluster only. On a kernel restart the persisted vats are re-incarnated via initializeAllVats, but their IO channels were never re-created — any IOService references the vats held went dead at first use after the restart.

Concretely: the @ocap/service-matcher matcher vat persists across restarts (its OCAP URL is deterministic and its registry is in baggage), and it holds an IOService reference for the llm socket used to talk to its LLM bridge. After a daemon stop / daemon start cycle, the matcher vat came back up but the first registration attempt failed inside ingestService because the llm channel no longer existed in the kernel's IOManager. With this fix, the channel is re-established before any vat code runs.

Test plan

  • 4 new unit tests in SubclusterManager.test.ts covering: multi-subcluster restoration, skipping subclusters with no IO config, no-op when IOManager is absent, and continuing past a per-subcluster failure.
  • yarn workspace @metamask/ocap-kernel test:dev:quiet --run — all package tests pass.
  • yarn workspace @metamask/ocap-kernel lint — clean.
  • CI green.
  • End-to-end verification (in a downstream branch): daemon stop/start cycle now leaves the matcher's llm IOService live; first service registration after restart succeeds.

🤖 Generated with Claude Code


Note

Medium Risk
Touches kernel init ordering and external IO resources (sockets); failures are isolated per subcluster but affected vats still break at first IO use until the channel is restored.

Overview
Fixes dead IOService references after a kernel restart when subclusters were only created once via launchSubcluster.

Adds SubclusterManager.restorePersistedIOChannels(), which scans persisted subclusters with config.io and calls IOManager.createChannels again (same as at launch). Kernel.#init now awaits this after initSystemSubclusters and before initializeAllVats, so IO kernel services exist before re-incarnated vats run.

Per-subcluster createChannels failures are logged only; init continues for other subclusters. No-op if no IOManager was wired. CHANGELOG and four unit tests cover the new behavior.

Reviewed by Cursor Bugbot for commit 96ed00e. Bugbot is set up for automated code reviews on this repo. Configure here.

FUDCo added 2 commits June 25, 2026 16:29
Subclusters that declared `config.io` (Unix sockets, etc.) had
their channels created inside `launchSubcluster` only. On a kernel
restart the persisted vats were re-incarnated via `initializeAllVats`
but their IO channels were never re-created, so any IOService
references the vats held went dead at first use.

Add `SubclusterManager.restorePersistedIOChannels()` and call it
from `Kernel.#init` before `initializeAllVats`. Walks the persisted
subcluster table and re-creates declared IO channels via the
IOManager. Per-subcluster failures are logged and skipped rather
than aborting the broader init.
@github-actions

Copy link
Copy Markdown
Contributor

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 71.37%
⬆️ +0.06%
8862 / 12416
🔵 Statements 71.2%
⬆️ +0.06%
9012 / 12657
🔵 Functions 72.57%
⬆️ +0.09%
2138 / 2946
🔵 Branches 64.88%
⬇️ -0.01%
3579 / 5516
File Coverage
File Stmts Branches Functions Lines Uncovered Lines
Changed Files
packages/ocap-kernel/src/Kernel.ts 88.49%
⬆️ +0.10%
77.77%
🟰 ±0%
82.6%
🟰 ±0%
88.49%
⬆️ +0.10%
300-303, 320, 344, 419-429, 517, 585, 651-654, 667, 677-678, 731, 750
packages/ocap-kernel/src/vats/SubclusterManager.ts 95.91%
⬆️ +0.26%
90.76%
⬆️ +0.60%
100%
🟰 ±0%
95.86%
⬆️ +0.28%
194-197, 251, 334, 339-341, 349
Generated in workflow #4488 for commit 96ed00e by the Vitest Coverage Report Action

@FUDCo FUDCo marked this pull request as ready for review June 26, 2026 00:06
@FUDCo FUDCo requested a review from a team as a code owner June 26, 2026 00:06

@sirtimid sirtimid left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@FUDCo FUDCo added this pull request to the merge queue Jun 26, 2026
Merged via the queue into main with commit 03b6a62 Jun 26, 2026
33 checks passed
@FUDCo FUDCo deleted the chip/restore-subcluster-io-channels branch June 26, 2026 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants