Skip to content

Total parse/edit: never crash on input, errors as data, equivalence gated (closes #39)#40

Open
johnsoncodehk wants to merge 50 commits into
masterfrom
recovery
Open

Total parse/edit: never crash on input, errors as data, equivalence gated (closes #39)#40
johnsoncodehk wants to merge 50 commits into
masterfrom
recovery

Conversation

@johnsoncodehk

@johnsoncodehk johnsoncodehk commented Jun 11, 2026

Copy link
Copy Markdown
Owner

parse/edit on the handle API are now total: every input produces a tree plus a cst.errors field ([{ offset, end, message }], empty when clean) — never a throw. API misuse (no changes, out-of-range coordinates, foreign/stale handles) still throws; the module-level strict API is untouched, so the whole conformance/parity gate chain runs unchanged.

const cst = p.parse(anything);     // total
p.edit(cst, changes);              // total — error states included
cst.errors                          // what went wrong, current after every edit

Design

  • Strict pass first: valid inputs take today's strict path exclusively — byte-identical trees, full PEG arm exploration (gated). Only a strict reject re-parses with recovery enabled; lexer errors flip the same edit into the recovering pass mid-flight.
  • Repetition recovery at spine-shaped loops (statement/member lists): a failing element absorbs tokens into an $error row (its leaves keep the text-tiling invariant) up to the element's FIRST set, the enclosing seq's follower literal, or EOF. Expression-internal hooks were measured to cascade (273 errors from one broken identifier) and are deliberately excluded.
  • Bar discipline keeps recovery equivalence-safe and arm-blind: recovery fires only where parsing is stuck at a strict-proven fail point (pos ≤ bar ≤ maxPos ≤ bar+2, stateless so losing longest-match arms cannot consume bars); a failure past the bars aborts the attempt, mints the next bar, and the pass re-runs — adoption keeps re-runs cheap, and the runParse safety net obeys the same gate.
  • Diagnostics are derived, not collected: $error rows are found by descending the structurally-propagated rowRM spine of the final tree (per-pass candidate lists double-count under stateless re-adoption); lexer diagnostics persist as structured entries formatted at settle time with current offsets, maintained by the window splice.
  • Recovered streams broke two strict-era invariants, both fixed: the relex window must anchor below the earliest lexer diagnostic before the damage (a dangling quote pairs with a later edit — backward coupling; forward coupling was already guarded by resync equality), and '>' splits disable adoption for the rest of the parse (the frozen damage mapping is invalid after a mid-parse index shift).

A latent strict-mode hole, found and closed

The last equivalence divergence traced to a watermark contract violation that predates this PR: a Pratt rule's winning row is built before its failed LED extension arms run, so the row's stored lookahead extent under-records the rule's true probes — the memo watermark was always right, but the memo dies with its generation and adoption reads the row. A later edit inside a failed arm's reads could keep a stale row alive (typing the ) that turns a failed call arm into a successful one). Strict sessions never hit it because the triggering texts reject, and the reject was the firewall — total parsing removed the firewall and exposed it. Fix: the rule-level watermark is written back to the row at memo-store time.

Gates (32/32)

  • test/recovery.ts (new): valid corpus ≡ strict byte-identical with empty errors; invalid corpus total + deterministic + spans-in-bounds; a char-by-char typing session through transiently-invalid states — every keystroke ≡ a fresh parse, tree and errors.
  • test/incremental-verify.ts reworked to total semantics: every seeded step (syntax-breaking included) compares tree + errors against a fresh recovering parse — 128 steps, 0 mismatches.
  • test/multi-doc.ts reworked: 60 interleaved steps across two documents including broken text, plus the 9-point handle contract.
  • Strict parity 0 mismatches, lexer streams byte-identical, batch in band (11.2×), agnostic 9/9.

Numbers

Valid-state keystrokes are unchanged (9MB median 0.05ms warm). Broken-state keystrokes are incremental too (third follow-up commit): ~3–7ms on 9MB, ~100× over the interim full-re-parse design — see the recovering-adoption section below. Break/fix transitions still cost ~0.4s (the absorbed error region re-parses; recorded follow-up).

Closes #39.

Missing-token synthesis (follow-up commit)

Required token matchers in recovering mode synthesize a zero-width $missing leaf instead of failing, tsc-style: const x = f(1, 2; keeps its Call structure and reports expected ')'; function g() { return 1; closes the body and reports expected '}'. Synthesis is budget-free and position-pure — it fires iff a recovery bar lies in [pos, pos+2], never under probes (not()/optional/separator), never in free-fire — so it is a pure function of (text, bars) and cannot desync the two engines.

Two structural consequences, both gated:

  • Zero-width matches now exist (only via synthesis — a strict zero-width element would never terminate its loop). Every loop discards them: plain reps break on pos === before alone, hooked reps discard + resync, leftRec continuations and Pratt LEDs refuse zero-width wraps. Same-position re-entry of a rule through a synthesized leading token is an unbounded recursion no grammar shape rules out — recovering runs keep a (rule, pos) in-progress set and fail the re-entry (PEG cycle semantics, zero strict-path cost). The sentinel also dissolved the bar +1 ladders the recursion crashes were minting: broken-doc recovery in the incremental gate dropped 10.7s → 1.2s.
  • The bar protocol's input must be adoption-invariant (bars = strict-fail maxPos; the next bar must be a function of (text, bars), not of what got adopted). Three fixes: frameMax, a frame-local advance watermark, makes recorded probe reaches (rowExt/memo watermarks) exact instead of contaminated by earlier-sibling probes — closing the recorded "exact per-frame extents" item at one extra compare on frontier breaches; recovery runs are adoption-free (attempt loop and the lex-recovered first run — a recovering row's reach is bar-history-dependent, so replaying it poisons the next bar); surgery refuses recovery-made trees (a strict splice into kept $error/$missing siblings was a fake strict success that froze the old text's recovery shape, shifted).

New gate: test/incremental-grammars.ts — all 7 grammars × generative inputs × seeded edit sessions, every step edit ≡ fresh (tree + errors), self-consistency walk, totality: 672/672. test/recovery.ts gains exact-diagnostic synthesis pins; test/incremental-verify.ts gains the five protocol-pin GLUE pairs. Suite 33/33.

Missing-nonterminal synthesis (second follow-up commit)

$missing generalizes from tokens to RULES — the tsc "Expression expected" analog: const a = ;expected Expr (one diagnostic, right offset), same for operator rhs (a + ;, -;), mixfix arms (x ? y : ;), and list elements after a real separator (a, ;, f(1, ;). Hooks at parseRuleEntry's fail exit + the three Pratt rhs sites that bypass rule entries.

Placement follows commitment semantics: an optional group / repetition element fails freely while uncommitted (its absence is not a diagnostic), but once it consumes a real token, missing pieces synthesize — opt('=', Expr) commits at the real =; rep(seq(',', Expr)) cannot mint a phantom , but synthesizes the Expr after a real one. not() and separator probes stay absolutely suppressed; FIRST-token call-site guards open under recovering so the rule-entry hook is reachable.

The new shapes surfaced two latent bugs, both fixed: the previous commit's frameMax conversion had been double-applied at the 12 token-advance sites by a patch-script composition hole (token consumes never raised maxPos; both engines ran the same wrong bar protocol, so equivalence gates stayed green — a reminder that edit≡fresh cannot see consistently-wrong protocol inputs), and the memo-jump coordinate refresh read toff(tokN) past the token columns for zero-width rows minted at EOF (stale slots under handle reuse → an expected Expr at offset 8 in a 5-char document).

Synthesis quality pins: 3 → 9 exact-match cases in recovery.ts. Suite 33/33; 9MB: fresh 438ms, valid keystroke ~0.6–5ms warm, breaking 649ms, while-broken 438ms, fixing 368ms.

Recovering adoption under bar purity (third follow-up commit)

Typing in a broken 9MB document: ~440ms → ~3–7ms per keystroke. Recovery runs adopt rows from the previous tree again, made sound by two steps:

  • Position-pure decisions: recoverArmed now takes the failing element's own frame-local probe reach (staged frameMax) instead of reading the global maxPos — a global frontier parked on a far bar can no longer arm an unrelated loop. With that, every recovery decision (hook arming, token/nonterminal synthesis, the cycle sentinel) is a function of the row's window.
  • The bars-window predicate: a row adopts in a recovering run iff the bars inside its window [start, reach+2] are identical (shifted) to the bars the build run saw there (lastBars rides the document register set; strict trees carry [], free-fire trees null). Window text + window bars then determine the frame's behavior completely — including losing-arm fires and synthesis — so $error-containing rows adopt too (the error region is exactly what stays stable across far edits).

The changed fire pattern exposed a latent bug in committed code, now fixed: collectErrRows decoded a $error row's first kid as a token leaf unconditionally, but the runParse leftover net builds a wrapper $error whose kids are nodes — the message quoted text from a garbage offset, and the two text layers resolved it differently (equal trees, different messages — exactly the kind of impurity only the equivalence gate's error-field comparison can catch).

Gates: 672/672 cross-grammar, incremental gate 9.9× vs fresh on mixed sessions, suite 33/33.

The transition cliff, killed (fourth–sixth commits)

The recorded ~0.4s break/fix transitions decomposed into an attempt-re-parse cost and a lexer relex-to-EOF, each closed by a proven mechanism:

  • Cross-attempt memo survival: recovery attempts parse the same stream under a monotonically growing bar list, so a memo entry whose window is bar-free behaved strictly (no synthesis, no skip arming) and is a pure function of window text — valid in every later attempt. The exception is the cycle sentinel, which can fire without synthesis and depends on the ancestor stack: recRunning now carries entry serials, and a refusal leaning on a frame entered before the current one taints its memo entry (own-generation-only, taint propagating on reuse). This is precisely the hole that sank the first survival attempt.
  • Conditional lexer resync: the windowed relex previously resynced only on exact stack-depth equality, so a paren-balance-changing edit shifted the suffix's depth column and regrew the window to EOF (~130ms on 9MB). Two proven conditions replace it — content-equal stacks (equal depth, neither lex dipped below it since the damage start) in O(1), or a depth-shifted adoption when the old suffix never pops an entry open at the candidate (lazy suffix-min over the depth records), with the adopted depth column re-based by the shift. Closes four latent unsoundness classes of the old equality path along the way (flag-bearing candidates, template brace counters).
  • Recovering surgery: trySurgery now accepts recovery-made trees when the damage and the re-parsed span sit clear of every bar window — the splice then provably commutes with every recovery decision (attempt bar lists are prefixes of the final list, so one check covers all attempts). The suffix bars shift with the splice instead of being cleared.

9MB transitions: breaking 335ms → ~6ms, fixing 230ms → ~3ms; valid keystroke ~1ms; while-broken typing at valid-path parity. perf-bench worst 472ms (was 802ms).

Diagnostics: viable sets + paired openers (seventh commit)

  • expected ',' or ']' — companion literals are provably still accepted at the failure position (repetitions before the failing literal are always re-enterable; one-shot optionals are crossed but contribute nothing, since they may already have consumed). A static FIRST union would name impossible continuations (after [1, 2 an expression is not viable); tsc under-reports the same position as ')' expected.
  • to match this '(' — per-closer structural openers derived by intersecting preceding-literal sets across all seq occurrences ()(, ][, whiledo fall out; no bracket list is hardcoded), attached as a related: { offset, end, message } field pointing at the opener leaf.

Measured against tsc and tree-sitter (eighth commit)

test/head-to-head.ts, one 9MB document, identical single-character edit scripts (README has the full table): Monogram beats tsc on every phase (valid keystroke 0.37ms vs 37ms; while-broken 0.21ms vs 13.6ms) and beats or matches tree-sitter everywhere except the two break/fix transition edits — profiling attributes those almost entirely to lexer-layer suffix bookkeeping on the bench's first-touch 4.5MB cursor jump (one-time suffix-min allocation + EOF-relative re-basing); the strict-fail pass measures 0.35ms, the attempts 0.6ms, and repeated transitions at one cursor position settle to ~2ms. test/recovery-conformance.ts scores our diagnostics against tsc's parser on every conformance file it rejects: recall 59.1%, precision 82.4%.

Two syntactic over-accepts, fixed (ninth commit)

Unterminated templates (`tpl ${x;) parsed clean — a template head now commits to its middle/tail chain in both engines; and a bare reserved case slid through the identifier-expression fallback, letting switch (x) { case 1 y(); } parse as three statements. A full accept/reject flip scan over the conformance corpus shows exactly one flip: an intentionally-invalid error test now correctly rejects.

The formal spine + a complete gate (tenth commit)

TOTAL-PARSING.md writes up the contract, the bar-determinism theorem, the three structural theorems, the window-replay theorem with its three corollaries, the lexer-resync soundness conditions, and the one known open caveat (row-level taint). test/exhaustive-edits.ts (CI, suite 34/34) checks edit ≡ fresh completely within a bound: every document ≤ 4 chars over a small bracket grammar × every single-character edit (~330k steps; the 3.2M-step depth-5 run is also clean). It caught a real one-case regression in the surgery length update on its first run.

Ceiling round (eleventh–fourteenth commits)

  • Row-level taint closed: rowRM is now bitwise — bit 1 keeps the structural error containment the diagnostics walk descends, bit 2 marks a context-tainted result (a frame whose parse leaned on the cycle sentinel finding an ancestor). Tainted rows refuse recovering adoption and run extension; the one open caveat in TOTAL-PARSING.md is gone.
  • Depth-0 shifted resync is O(1): the lexer keeps an ascending list of )-pops that found an empty paren stack (almost always empty); the dominant statement-boundary resync case then needs one end-of-list check instead of an O(suffix) minimum build. Steady break/fix transitions settle at ~2 ms (strict-fail pass 0.23 ms, attempts 0.46 ms; the raw 7-column suffix memmove measures 0.07 ms — no storage floor). The head-to-head's 12 ms transition rows measure a one-time 4.5 MB cursor jump (the EOF-relative bias boundary moves with the cursor — exactly the design that makes valid keystrokes 0.37 ms); the docs now attribute this correctly.
  • Statement keywords blocked as expressions, for-in takes comma objects: bare if/for/return/… parsed as identifier expressions, letting namespace if {} fall apart into accepted identifier statements. Blocking for exposed a real masked gap — for (a in b, c) had been parsing as a call of the identifier for; the for-in object now takes a full comma Expression while for-of correctly keeps one AssignmentExpression. Every accept/reject flip across the conformance corpus was individually adjudicated against tsc: 7 flips, all toward tsc, zero away ('var' and 'extends' were caught by this scan leaning on tsc-accepted shapes and stay out). Error-report recall vs tsc's parser: 55.7% → 61.2%; the remaining 108 divergent files are enumerated in the ROADMAP (31 = the [Await]/[Yield] context class, 77 = named per-shape strictness items).

…lence class

parse/edit on the handle API never crash on input: the STRICT pass runs
first (valid path byte-identical, full PEG arm exploration - gated by
test/recovery.ts section 1 and the untouched parity suite), and only a
strict reject re-parses under the recovery machinery:

- Repetition recovery at spine-shaped loops (ref / alt-of-refs elements;
  deep-FIRST hooks measured 273-error cascades from arm probing and were
  reverted): a failing element absorbs tokens into an $error row up to
  the element FIRST set / the enclosing seq's follower literal / EOF.
- BAR DISCIPLINE keeps recovery equivalence-safe and arm-blind: fires
  only where parsing is STUCK AT a strict-proven fail point
  (pos <= bar <= maxPos <= bar+2, stateless so losing arms cannot consume
  bars); failures past the bars abort the attempt and mint the next bar
  (32-attempt cap degrades to deterministic free-fire). The runParse
  safety net obeys the same discipline.
- The lexer recovers under the same flag (error tokens + structured
  diagnostics; window truncation keeps the LEX_RETRY regrow path).
- Diagnostics are DERIVED, not collected: $error rows found by
  descending the structurally-propagated rowRM spine (per-pass candidate
  lists double-counted under stateless re-adoption); lexer diagnostics
  live as structured entries formatted at settle time (stored message
  strings would embed stale offsets), maintained by the window splice
  and shifted by surgery.
- Recovered streams break two strict-era invariants, both fixed:
  windowed relexing must anchor BELOW the earliest lexer diagnostic
  before the damage (a dangling quote pairs with a later edit - backward
  coupling; forward coupling is already guarded by resync equality), and
  rows built during a recovering pass may under-record their probe
  watermark when any arm fired recovery (recFires stamping refuses them
  to strict adoption; relocate-path surgery also normalizes copied
  prefix rels - an end-relative value below the remapped rowNF boundary
  would drift on every later length update).
- '>' splits disable adoption for the rest of the parse (the frozen
  damage mapping is invalid after a mid-parse token-index shift).

Gates: incremental-verify reworked to total semantics (every step
compares tree+errors against a fresh recovering handle, 128 steps 0
mismatch), multi-doc reworked (60 interleaved steps incl. broken text,
contract 9/9), 31/31 suite, strict parity 0 mismatches.

KNOWN RESIDUAL (test/recovery.ts, not yet registered): typing-through-
invalid session diverges at 1 of 20 keystrokes - a strict pass-1 edit
ADOPTING over a post-recovery tree drops one Pratt wrap layer vs a
fresh strict parse (single-keystroke repro in the gate; suspected
adoption interplay with LED chains on recovering-built substrate).
…valence gated

The residual typing-session divergence traced to a watermark contract
violation that PREDATES recovery and was latent in strict incremental
parsing: a Pratt rule's winning row is finishNode'd BEFORE its failed
LED extension arms run (the NUD/shorter candidate survives the longest
match), so rowExt under-records the rule's true probe extent. The memo
watermark (maxPos at parseRuleEntry exit) was always correct - but the
memo dies with its generation, and ADOPTION reads the row. An edit
landing inside a failed arm's reads then kept a stale row alive ('const
x = f' adopted with ext=4 while typing ')' at token 20 turns the failed
call arm into a successful one). Strict sessions never caught it
because the texts that exercise it (unclosed calls) REJECT, and the
reject was the firewall; total parsing keeps such trees alive.

Fix: write the rule-level watermark back to the row at memo-store time
(rowExt[result] = max(rowExt, maxPos - start)). This subsumes the
recFires mode stamp (removed - rowRM is purely structural again for the
diagnostics walk), restoring broad strict adoption over recovered
substrates: broken-state keystrokes on 9MB dropped from ~1.6s to the
~0.3s bar-iteration cost (valid-state keystrokes stay at 0.05ms).

test/recovery.ts now fully green and REGISTERED (32/32): valid corpus
byte-identical to strict with empty errors, invalid corpus total and
deterministic, the char-by-char typing session 20/20 keystrokes
equivalent to fresh parses (tree AND errors). The interpreter gains
parseTotal/edit parity (no recovery machinery: degrades to a zero-width
$error root with the strict diagnostic).

incremental-verify 128 steps 0 mismatch, multi-doc 60 steps contract
9/9, strict parity 0 mismatches, lexer streams byte-identical, batch in
band (11.2x), agnostic 9/9.
The seeded mutation lists never inserted a bare ';' — splitting an
existing expression's structure (f(a;, b) / (a +; b) / obj.m(;1).n) was
covered only by the general machinery, not exercised. Both gates'
INSERT pools gain ';' and the glue list gains three explicit
break-then-compare pairs; verified break ≡ fresh and restore ≡ original
byte-identically (tree and errors) before pinning.

Observation for the conformance backlog: several of these broken shapes
parse with ZERO errors - the strict grammar itself accepts them
(over-accept surface, identical on both engines), not a recovery
artifact.
…onsistency

The incremental/recovery gates were TypeScript-only while every grammar
shares the emitted runtime - the non-TS incremental behavior (markup
lexer modes, the fallback-lexer path, other token algebras) was ungated.
test/incremental-grammars.ts closes that: generative inputs (grammar-gen)
per grammar x seeded char-level edit sessions, each step checking
(1) edited tree + errors byte-identical to a fresh handle parse,
(2) tree self-consistency - every span inside its ancestors (the
engine-internal invariant an external compare misses when both sides
share a corruption; the aggressiveChecks idea), and (3) totality.

It immediately found three real holes, all fixed:
- totalNet pushed its diagnostic into the VIEW layer, which the next
  settle rebuild wiped on exactly one side (now a kind-4 source entry
  formatted at settle - verbatim engine message).
- the fallback-lexer full-relex path never cleared persisted docLex, so
  a totality-net diagnostic outlived the edit that fixed the text.
- the window resync retracts the duplicated token push (tokN--) but
  left the lexer diagnostic emitted FOR that token: the persisted entry
  survives via the suffix shift AND the window's copy stayed - the same
  character double-reported. Retraction now pops the window's own
  entries at/after the retracted token (lexDiagBase floor).

672/672 steps across typescript/javascript/typescriptreact/
javascriptreact/yaml/html/vue (489 exercising recovery). 33/33 suite,
lexer streams byte-identical, parser parity 0 mismatches, batch in band.
…erved

Required token matchers in recovering mode now synthesize a zero-width
\$missing leaf (expected identity in rowStart, LIT_NAMES/K_NAMES inverse for
the message) instead of failing, so 'const x = f(1, 2;' keeps its Call shape
and reports "expected ')'", and 'function g() { return 1;' closes the body
with "expected '}'". Synthesis is budget-free and position-pure: it fires
iff a recovery bar lies in [pos, pos+2] (missAt), never under probing
(not()/optional/separator probes) and never in free-fire.

Zero-width success is a synthesis-only artifact (a strict zero-width element
would never terminate its loop), so every loop discards it: plain reps break
on pos===before alone (restoring scn), hooked reps discard + recoverSkip,
leftRec continuations and Pratt LEDs refuse zero-width wraps. A rule can
still re-enter ITSELF at the same position through a synthesized leading
token — an unbounded recursion no grammar shape rules out — so recovering
runs keep a (rule, pos) in-progress set and fail the re-entry (PEG cycle
semantics; recRunning, zero strict-path cost). That sentinel also dissolved
the bar +1 ladders the recursion crashes were minting: broken-doc recovery
drops ~9x in the incremental gate (10.7s -> 1.2s).

Equivalence (edit == fresh) exposed that the bar protocol's input was not
adoption-invariant; three structural fixes:

- frameMax: a frame-local advance watermark (reset to the rule's start at
  entry, folded into the parent on exit) replaces the global maxPos in
  rowExt/memo watermarks, making recorded probe reaches EXACT instead of
  contaminated by earlier-sibling probes. Bars (= strict-fail maxPos) now
  reconstruct identically under adoption; the hot advance pays one extra
  compare only at frontier breaches (frameMax <= maxPos nests the updates).
  This also closes the recorded "exact per-frame extents" backlog item and
  lands the bar on the true farthest probe (no more phantom synthesis from
  inflated memo-jump watermarks).
- Recovery runs are adoption-free (edit-side attempt loop AND the
  lex-recovered first run): a row recorded under a recovering frame carries
  that run's bar-dependent reach, so replaying it makes the next bar a
  function of the OLD bar history instead of (text, bars). Attempt 0 (empty
  bars, behaviorally strict) re-derives the true strict frontier; every
  attempt is byte-equal to the fresh side's. The barIn adoption-refusal
  window from the first synthesis attempt is dead under this rule and
  removed; adoptSeek's recovering rowRM bypass likewise.
- trySurgery refuses recovery-made trees (rowRM reaches the root
  structurally): a strict splice into kept \$error/\$missing siblings was a
  fake strict success that froze the OLD text's recovery shape, shifted.

Gates: incremental-grammars 672/672 across 7 grammars; recovery.ts gains a
synthesis-quality section (exact diagnostics + \$missing presence) and 4
session-found invalid shapes; incremental-verify gains the 5 protocol-pin
GLUE pairs; multi-doc 60/60 + contract 9/9; check suite 33/33; corpus
parity 401/401 sample, lexer parity 5695; perf-bench PASS (worst 803ms vs
802ms baseline; 9MB valid keystroke unregressed). verify-rejects: a tsc
Debug.assert crash on 'await using' shapes is counted as ORACLE-CRASH and
skipped (a crashed oracle has no verdict) instead of killing the gate.
Required RULE references failing inside the bar window now mint a zero-width
\$missing row carrying the rule identity (RULE_MISS_BASE + rid in rowStart),
reported as "expected Expr": 'const a = ;' / 'a + ;' / '-;' / 'x ? y : ;' /
'a, ;' / 'f(1, ;' all produce a single tsc-grade diagnostic at the right
offset. Hooks: parseRuleEntry's fail exit (memoized like any result) plus the
three Pratt rhs sites that bypass rule entries (operator LED, prefix NUD,
chain-rhs LED).

Synthesis placement follows COMMITMENT semantics, replacing the flat
probing counter for optionals: an optional group or repetition element may
fail freely while uncommitted (probeBase = its start; 'the optional thing is
absent' / 'the list ends' need no diagnostic), but once it consumes a real
token past that base, missing pieces synthesize — 'const a = ;' commits at
'=' and mints the Expr; rep(seq(',', Expr)) cannot mint a phantom ',' to
keep a list alive, yet after a real ',' the element synthesizes. not() and
separator probes stay absolutely suppressed (pure lookahead). FIRST-token
call-site guards open under recovering (one global read on the strict
guard-fail path): at a bar the next token is exactly what cannot start the
rule, and the hook lives inside parseRuleEntry — 'a, ;' must reach it.

Two latent bugs fixed in passing, both found by the new shapes:

- The frameMax conversion in the previous commit was double-applied at the
  12 token-advance sites by a patch-script composition hole (edit #3's
  pattern matched text edit #2 had just inserted; the anchor counts were
  asserted on the pre-edit source), leaving the nested inner test
  unreachable — token consumes never raised the global maxPos, so bars were
  minted from a watermark that only memo jumps could move. Equivalence
  gates stayed green because both engines ran the same wrong protocol;
  the synthesis quality work surfaced it as losing-arm wins. Advances now
  pair frameMax/maxPos correctly.
- The memo-jump coordinate refresh read toff(start) unguarded; for a
  zero-width row minted AT EOF, start == tokN reads past the token columns
  (stale slots from a longer previous document under handle reuse) — the
  recovery gate's in-bounds check caught an "expected Expr" at offset 8 in
  a 5-char document. The refresh now uses the same EOF guard as offset().

recovery.ts synthesis pins 3 -> 9 (the six nonterminal shapes above, exact
diagnostics + \$missing presence). All gates green: incremental-grammars
672/672, incremental-verify 136 steps, multi-doc 60 + 9/9, recovery
valid/invalid/typing/synthesis, suite 33/33, perf-bench PASS, 9MB fresh
438ms / valid keystroke warm ~0.6-5ms / breaking 649ms / while-broken
438ms / fixing 368ms (broken-state costs are the recorded follow-up).
Typing in a broken 9MB document drops from ~440ms to ~3-7ms per keystroke
(avg 3.2ms over a 10-keystroke burst; incremental gate 9.9x vs fresh on its
mixed valid/broken sessions). Recovery runs now ADOPT rows from the previous
tree again — soundly this time, by making every recovery decision a pure
function of the row's window:

- recoverArmed takes (from, reach): a hook arms iff THE FAILING ELEMENT is
  stuck at a bar — its own frame-local probe reach (staged frameMax around
  hooked-loop elements) sits on the bar. The old form read the GLOBAL maxPos,
  so a frontier parked on a far bar could arm an unrelated loop whose own
  probes never approached it — a decision no window can reproduce. The
  runParse nets pass (pos, maxPos): top-level semantics unchanged.
- barsWindowEq: a row adopts in a recovering run iff the bars inside its
  window [start, reach+2] are IDENTICAL (shifted) to the bars the build run
  saw there — with position-pure decisions, window text + window bars
  determine the frame's behavior completely, including losing-arm fires and
  synthesis. lastBars rides the document register set; strict trees carry
  [], free-fire trees null (free-fire is not bar-pure - never adopted while
  recovering). rowRM rows are adoptable under the predicate (the error
  region itself is what stays stable across far edits), and runExtend
  re-checks per member. The blanket adoption-off in the bar iteration and
  the lex-recovered first run is removed; attempt 0 (no bars) adopts exactly
  where the build run was also bar-free.

The changed fire pattern exposed a latent message-derivation bug present in
committed code: collectErrRows decoded a \$error row's first kid as a token
leaf unconditionally, but the runParse leftover net builds a WRAPPER \$error
whose kids are nodes ([partial-root, tail-error]) - (~nodeId)>>>2 indexed a
garbage column, docText read text from an unrelated offset, and the two text
layers (contiguous string vs pieces) resolved the garbage differently, which
is how the gate caught it (equal trees, different messages). Wrapper-shaped
\$error rows now fall through to the generic descent so the tail derives its
message from its real first token.

All equivalence gates green (incremental-grammars 672/672, incremental-verify
136 steps, multi-doc, recovery incl. synthesis pins 9/9), suite 33/33,
perf-bench PASS, strict corpus parity intact. 9MB: fresh ~508ms, breaking
keystroke ~409ms (the absorbed error region re-parses; recorded follow-up
with fix-transition ~395ms), keystrokes while broken 3-13ms.
Recovery attempts within one sequence parse the same token stream under a
monotonically growing bar list, so a memo entry from an earlier attempt is
provably valid in a later one when its probe window [start, mx+2] contains
no bars: no bars means no synthesis and no skip arming, and the opened
dispatch guards only add non-consuming probes - the frame behaved strictly,
a pure function of the window text.

The one exception is the recRunning cycle refusal, which can fire without
synthesis (open guards let a ref chain cycle at one position) and depends
on which frames are on the stack. recRunning now maps each frame to an
entry serial; a refusal leaning on a frame entered before the current one
taints the current frame's memo entry (stamped -memoGenCur: reusable only
in its own generation, and propagating the taint to whoever reuses it).
This is the diagnosed hole that sank the first survival attempt.

Survival is edit-side only: the fresh-parse attempt loop calls parseCore,
which resets the arena cursor per attempt, so an earlier attempt's rows are
clobbered there. A mid-parse '>'-splice disables survival for the rest of
the sequence (pre-split positions can't be revalidated).

Also removes recFires (dead since the rowExt write-back subsumed the
recFires stamp).

9MB transitions: breaking 335ms -> 157ms, fixing 230ms -> 146ms (both now
lexer-bound); while-broken typing 3.4ms unchanged. All equivalence gates
green: incremental-grammars 672/672, incremental-verify 136, multi-doc 60,
recovery pins 9/9, check 33/33, emit-parser corpus parity 401/401.
…liff

The window relex resynced only on exact stack-depth equality, so an edit
that changes paren balance shifts the entire suffix's absolute depth column
and the window regrows to EOF - a 9MB document paid ~130ms of relexing on
every break/fix transition for a one-token depth shift.

The resync now has two sufficient conditions, both proven from observable
state (template stacks empty on both sides; candidate token carries no
cross-token lexer flag a successor reads):

- FAST (O(1)): equal depth and neither lex dipped below it since the
  divergence point (damage start) - every open entry is then common to
  both lexes, the stacks are content-equal, and every future pop behaves
  identically. Trajectory minimums are folded incrementally (old side
  seeded from the damage-interior tokens, new side tracked per push).

- SHIFTED: the old suffix never pops an entry open at the candidate
  (lazy suffix-min over the old depth records, pop-on-empty = -1): no open
  entry's head-ness is ever read again, stack contents are irrelevant, and
  the depths may differ by an arbitrary shift. The splice then re-bases the
  adopted tkPd column by the shift, restoring true absolute depths ('('
  head bits are local facts of their own neighbors and stay valid).

This also closes four latent unsoundness classes in the old equality path:
a resync candidate that is a postfix-ambiguous op, control keyword, '(' or
')' lets the adopted successor read state derived from tokens the window
re-lexed differently; and template-depth equality cannot prove the mutable
interp brace counters equal (resync inside templates now waits for depth
0). Each slides the resync at most a few tokens.

9MB transitions: breaking 157ms -> 5.8ms, fixing 146ms -> 2.9ms; valid
keystroke 1.8ms -> 1.1ms; while-broken typing 3.4ms -> ~2ms. Gates: lexer
parity 5695 diff=0, incremental-grammars 672/672, incremental-verify 136,
multi-doc 60, recovery pins 9/9, check 33/33, corpus parity 401/401,
perf-bench worst 472ms.
trySurgery refused any tree containing recovery rows (rowRM root). It now
accepts them when the edit provably commutes with every recovery decision:
decisions are position-pure functions of (window text, window bars), so a
splice is sound when no bar window touches the damage or the re-parsed
span's probe reach - kept rows replay identically at shifted positions, and
a fresh recovering parse behaves strictly across the span, exactly like the
strict re-parse the surgery runs (a fire inside the span would need a bar
at/below the probe reach + 2; prefix attempts use prefixes of the same bar
list, so one check against the final list covers every attempt). The
spliced tree keeps its bar list with suffix bars shifted by the token
delta; bars adjacent to the damage (unmappable) and free-fire trees
(lastBars null, not window-pure) refuse.

The multi-doc gate immediately caught a latent length bug this exposed:
finishNode takes a node's char end from its LAST KID, which a trailing
zero-width $missing row pushes past the last real token - but surgery
re-derived ancestor lengths from the token columns, clipping that
extension. A node whose token end lies strictly beyond the damage now keeps
its end shape (rowLen += chrD: every end-determining coordinate sits in the
shifted suffix); only nodes ending at/inside the damage use the token
derivation (no zero-width row can end them - zero-width rows live at bars,
and damage-adjacent bars were refused). Strict trees take either branch to
the same value.

9MB while-broken typing now sits at valid-path parity (~1-1.7ms vs ~1ms
valid; surgery additionally applies wherever its container shapes allow).
Gates: multi-doc 60 + contract 9/9, incremental-grammars 672/672,
incremental-verify 136, recovery pins 9/9, check 33/33, corpus parity
401/401.
Two grammar-derived enrichments of the $missing diagnostics, both resolved
at settle from the tree (zero parse-time cost, adoption/replay-safe):

- PAIR_OPEN: for each literal C, intersect - across every seq occurrence of
  C with preceding literals in its sequencing scope (groups inlined;
  quantifier/alt contents inherit a copy of the scope's accumulator, since
  they physically follow its earlier literals; nothing leaks back) - the
  sets of those preceding literals. A unique survivor is C's structural
  opener: ')' keeps '(' through if/while/call alike, interior separators
  intersect away, and ','/':'/'(' themselves die as ambiguous. The closer's
  diagnostic then carries related info pointing at the matched opener leaf
  found among its earlier siblings ("expected ')'" / "to match this '('"),
  with keyword pairs like 'while'<-'do' falling out for free. shiftDiags
  shifts the related anchor on its own coordinates (it can sit on the other
  side of the damage from its diagnostic - the surgery path caught this).

- Viable-set messages: for a required literal C in a seq, the literals
  PROVABLY still accepted when C's matcher fails - repetitions before C are
  always re-enterable so their nullable-prefix-reachable literals stay
  viable; nullable one-shot items are crossed but contribute nothing (they
  may already have consumed). "expected ',' or ']'" therefore never names
  an impossible continuation, unlike a static FIRST union (after `[1, 2` an
  expression is not viable) - and unlike tsc, which under-reports the same
  position as "')' expected". Registered per call site during emission and
  threaded through the literal matchers into the $missing row (rowStart
  bits 21+; the row is zero-kid, the slot is free), decoded at settle.

cst.errors entries gain an optional related: {offset, end, message} field.
Pins re-pinned (11/11, exact); gates: incremental-grammars 672/672,
incremental-verify 136, multi-doc 60, check 33/33, corpus parity 401/401,
perf-bench unchanged.
test/head-to-head.ts runs one 9MB TypeScript document through identical
single-character edit scripts (warm valid keystrokes, a paren-deleting
breaking edit, while-broken typing, the fixing edit) on all three engines,
with positions recomputed from the current text so every engine sees
byte-identical edits and timers wrapping only the engine call. tsc runs
setParentNodes=false; node-tree-sitter caps input strings at 32767 chars,
so it reads through its 16KB chunk-callback path.

Results (node v24, Apple silicon): Monogram beats tsc on every phase
(fresh 177 vs 212ms, valid keystroke 0.37 vs 37ms, while-broken 0.21 vs
13.6ms, fixing 1.0 vs 14.1ms) and beats or matches tree-sitter on fresh
(177 vs 458ms) and while-broken typing; tree-sitter wins the two
transition edits (0.26 vs 13ms breaking), where the strict-first
architecture pays one adoption-assisted strict pass to prove rejection
before recovering. Numbers + the two byte-identity guarantees added to
the README under 'How it measures up'.
test/recovery-conformance.ts: on every single-file conformance test tsc's
PARSER rejects (parseDiagnostics non-empty - the live source of the
.errors.txt syntax baselines, with semantic noise excluded by definition),
compare Monogram's total-parse cst.errors bidirectionally at +/-8 chars:

  recall    (tsc errors we also report):   530/951 = 55.73%
  precision (our errors tsc also reports): 580/702 = 82.62%
  first-error agreement:                   203/355 = 57.18%
  files we accept but tsc rejects:         116

The sample divergences localize the gap classes: the accept side is
dominated by tsc's context-parameter checks ([Await]/[Yield] parameter
positions, reserved names in declaration slots) plus a few CFG-expressible
shapes; the missed side is recovery-policy granularity (one absorbed
region vs tsc's several pointed diagnostics).
Two syntactic over-accepts found by the diagnostics comparison against tsc:

- parseTemplateExpr (both engines) treated a template HEAD as committing to
  nothing: on EOF or any non-middle/tail token after a substitution it
  closed the $template node and returned success, so 'let s = `tpl ${x;'
  parsed clean. A head now commits to the full chain - every substitution
  must hold an expression and every span must continue (middle) or close
  (tail); an unterminated template is a parse failure, not a shorter match.
  Also rejects empty substitutions ('`${}`'), matching tsc.

- notReservedExpr gains 'case': the bare-identifier expression fallback
  accepted the reserved word, so 'switch (x) { case 1 y(); }' parsed as
  three statements through the switch body's Stmt arm (the flat
  many(SwitchCase) shape made the missing ':' invisible).

A full accept/reject flip scan over the single-file conformance corpus
shows exactly ONE flip: TemplateExpression1.ts (an intentionally-invalid
error test tsc rejects) now correctly rejects - no valid file regressed.
Error-recovery conformance recall 55.7% -> 59.1%; check 33/33, engine
parity 401/401, all 7 generated outputs byte-identical.
TOTAL-PARSING.md: the formal spine in one place - the totality contract,
strict-first two-pass structure, the bar discipline with its determinism
theorem (bars are a pure function of the token stream, forcing every
ingredient to be adoption-invariant), position-pure recovery actions with
commitment semantics, the three structural theorems the generative gates
forced (zero-width = synthesis-only; same-position cycles and their taint
refinement; exact adoption-invariant watermarks), the window-replay theorem
with its three corollaries (recovering adoption, cross-attempt memo
survival, recovering surgery) and the one known open caveat (row-level
taint), the two lexer-resync soundness conditions, tree-derived
diagnostics, and the measured head-to-head numbers.

test/exhaustive-edits.ts (CI gate 34/34): over a small bracket-and-list
grammar, EVERY document up to 4 chars over the grammar's alphabet x EVERY
single-character edit (delete/replace/insert at every position) must parse
byte-identically to fresh - tree and errors. Complete within its bound:
~330k steps (EXH_MAXLEN=5 runs the 3.2M-step deep version, also clean).

The gate immediately earned its keep: it caught a one-case regression in
the day-old surgery length update - a node whose BASE token sits at the
damage start (leading trivia inserted at a node's very start) shifts base
and end together, leaving the length alone, so rowLen += chrD was wrong
exactly where the token derivation is right. keepEnd now also requires the
base token to sit strictly before the damage.
Phase-timing the head-to-head's 13ms breaking edit: the strict-fail pass
is 0.35ms and the recovery attempts 0.6ms - the cost is lexer-layer suffix
bookkeeping on the bench's first-touch 4.5MB cursor jump (a one-time
suffix-min allocation plus EOF-relative re-basing of the token columns
across the jump). Repeated break/fix transitions at one cursor position
settle to ~2ms. README and TOTAL-PARSING.md now say so instead of blaming
the strict-first pass.
rowRM becomes bitwise: bit 1 keeps the structural error containment the
diagnostics walk descends; bit 2 marks a CONTEXT-TAINTED result - a frame
whose parse leaned on the cycle sentinel finding an ancestor (its outcome
is a function of the ancestor stack, not the text). The memo stamp alone
only protected the entry; the row adoptSeek can find was still reusable.
Tainted rows now also refuse recovering adoption and run extension,
closing the open caveat documented in TOTAL-PARSING.md. Strict adoption
already required rowRM === 0 and is unchanged.

notReservedExpr gains 'class': a valid class expression always out-matches
the bare-identifier fallback under longest-match, so forbidding the
fallback only rejects broken classes - 'const k = class extends D ;' with
no body parsed as three statements. A zero-flip accept/reject scan over
the whole single-file conformance corpus proves no valid shape regressed;
'extends' stays OUT - it is load-bearing for tsc's tolerated heritage
shapes ('interface I extends { }', 'extends A extends B', 'extends
Foo?.Bar' are all parse-accepted by tsc through the fallback, measured).

Gates: 34/34, corpus parity 401/401, generated outputs byte-identical,
transitions unchanged (~6ms first-touch, ~2ms steady).
The shifted lexer resync's dominant case is a depth-0 candidate (statement
boundary), where 'the old suffix never pops an entry open at the candidate'
collapses to 'no pop-on-empty beyond the candidate'. The lexer now records
the token indices of ')' pops that found an empty paren stack (an ascending
doc-level list, almost always empty - a stray closer beyond balance),
recomposed by the window splice, shifted by the '>'-split, and persisted on
the document register set. The depth-0 check is then one end-of-list
comparison instead of an O(suffix) minimum build; only depth > 0 candidates
(e.g. the fixing direction of a broken document) still build the suffix
minimum, lazily once per edit.

Steady-state breaking transitions on 9MB drop ~2.1ms -> ~1.6-1.9ms; the
profile now reads strict-fail 0.23ms + attempts 0.46ms + spread
bookkeeping, with the raw 7-column suffix memmove measured at 0.07ms - no
storage floor in the way. README/TOTAL-PARSING tables refreshed from a
fresh head-to-head run, with the cursor-jump amortization stated as what
it is (a far jump pays once, proportional to distance; local typing never
rewrites the suffix).

Gates: 34/34, lexer parity 5695 diff=0, incremental-grammars 672/672,
corpus parity, perf-bench under ceiling.
notReservedExpr grows by the statement keywords with no expression role:
break, continue, debugger, do, else, finally, for, if, return, switch,
try, while, with. Bare 'if' parsed as an identifier expression, which let
'namespace if {}' (the namespace arm correctly fails its notReserved name)
fall apart into three accepted identifier statements - the same fallback
family as 'case'/'class'. 'var' stays OUT: tsc parse-accepts 'for (var of
X)' through shapes that need it.

Blocking 'for' exposed a real grammar gap the fallback had been MASKING:
'for (a in b[c] = b[c] || [], d)' previously parsed as a CALL of the
identifier 'for' (the for-statement arm failed, the call parse won). The
for-in OBJECT is a full Expression - comma included - so both ForHead
in-arms gain many(',', Expr); for-of keeps a single AssignmentExpression
(tsc rejects 'for (x of a, b)', and so do we, where we previously
mis-accepted it through the call fallback).

Per-flip tsc verdict over the whole single-file conformance corpus:
7 flips, ALL toward tsc, 0 away. Error-recovery conformance recall
59.1% -> 61.2%, first-error agreement 57.5% -> 59.7%, we-accept files
115 -> 108. Gates 34/34, corpus parity 401/401, tree-sitter generate
clean on all 4 affected grammars, gate:treesitter 96.0%.
The 108 remaining accept-divergences split into the [Await]/[Yield]
context class (31 files - needs exclude()-style identifier-text context
threading in the engine) and 77 per-shape strictness items, each named
with its fix recipe (fix + flip-scan FN=0 proof).
…reject

ClassMember modeled decorators as a STANDALONE sibling alternative, which
tolerated an orphan '@dec' with no member and (together with the
modifier-named-field fallback) any decorator/modifier interleaving.
Decorators are now a prefix of the member shape ([many(DecoratorExpr),
many(Modifier), ...]) in both grammars, with the static-block arm taking
the same prefix ('@dec static {}' is parse-clean for tsc - the decorator
there is a semantic error only).

Cumulative flip-scan with per-flip tsc adjudication: 7 toward tsc, 0 away
(the first attempt rejected the decorated static block - tsc accepts it -
and the scan caught it). The 'public @dec method()' sub-case still parses
through the modifier-named-field fallback; matching tsc's greedy modifier
commitment there needs the fallback's bare-name arm split, recorded in the
ROADMAP item. Gates 34/34, corpus parity 401/401, tree-sitter generate
clean on all 4 affected grammars, gate:treesitter green.
tsc's measured rule: '@' directly after a property on the SAME LINE binds
to that property ('Decorators must precede the name and all keywords of
property declarations') - 'x @dec y()' and 'x = 1 @dec y()' parse-reject,
while 'x; @dec y()' and a newline before '@' accept. Encoded exactly: the
field tails' no-';' ending carries not([sameLine, Decorator]) in both
grammars (alt([';'], [not([sameLine, Decorator])])). This also closes the
'public @dec method()' shape: the bare 'public' field reading now refuses
the same-line decorator, and the modifier reading correctly fails.

not() now accepts an array as a seq, like everywhere else in the rule DSL
(the NotNode conversion previously threw on arrays).

Cumulative flip-scan with per-flip tsc adjudication: 12 toward tsc, 0
away. Gates 34/34, corpus parity 401/401, tree-sitter generate clean x4,
gate:treesitter green.
The windowed-relex resync aligned candidates on kind/text/offset/end but
NOT on the token's flags - yet the gap BEFORE the candidate can sit inside
the edit: inserting '42' into '}\n  privat' leaves every token byte
identical from the candidate on while removing its preceding newline. The
old token was adopted with a stale newlineBefore, and anything reading the
flag downstream (sameLine assertions, comment-aware folds) diverged from a
fresh parse. Found by delta-debugging an edit/fresh divergence to a
690-char repro and diffing full streams including flags; the leaf tilings
were identical, which is why tree comparisons alone never caught it.

The window lex has already computed the candidate's true flags when the
resync fires (it lexed the gap), so the fix is one equality in the resync
condition: the pushed candidate's flags must match the old token's. A
mismatch just keeps lexing - the next candidate's gap lies beyond the
edit, so the flags converge and the regrow terminates.

Gates: 34/34, lexer parity 5695 diff=0, incremental-grammars 672/672,
corpus parity 401/401.
Lands the full measured tsc class-member ruleset (probes 12/12, flip-scan
3-toward/0-away on top of the decorator-prefix + sameLine work already in):

- class-field ASI: a ';'-less field allows only a same-line '}' — 'x y',
  'x = 1 y = 2', 'var x = 1;' parse-reject; newline / ';' / '}' accept.
  Tail generalized to alt([';'], [not(sameLine)], [not(not('}'))]).
- modifier-vs-name: a modifier keyword followed by '('/'='/':'/';'/'?'/
  '!'/'<'/'{'/'}' is the member NAME, not a modifier ('public() {}',
  'static = 1', 'public public() {}').
- parse-tolerated member modifiers: declare (real), export/in/out
  (semantic errors tsc's parser accepts) — 'export Foo;', 'in a = 0;'.
- accessors take optional type params ('get x<T>()' parses).
- static-block arm takes a modifier prefix ('async static {}').

The blocker was gen-cst-match: it drops parse-time not() guards and emits
GREEDY repeats, so [many(Modifier), 'static', Block] was destructurer-
ambiguous — the modifier-repeat swallowed the 'static' keyword leaf the
literal needed, and every static block failed to match. Fixed at the root:
a greedy loop / non-required optional now leaves at least minKids(suffix)
children for the required steps that follow it (threaded across nesting).
Proven a no-op on the parser's own trees — count + suffix-consumed = cc and
suffix-consumed >= minKids, so the cap cc-minKids never cuts below the
parser's actual count; it only blocks over-consumption a dropped guard used
to prevent. Verified: generated matchers byte-stable on all 7 grammars
before the recipe (cst-match-totality green), total after.

The js/jsx tmLanguage shift (async/accessor between storage.modifier
buckets) is scope-gap-NEUTRAL (95.7% correct / 77.0% exact / +5.1pt gap,
byte-for-byte identical before/after); ts/tsx tmLanguage unchanged.

Error-recovery conformance: recall 61.2% -> 62.4%, first-error 59.7% ->
62.3%, precision 82.7% -> 83.4%, we-accept 103 -> 100. Gates 34/34, corpus
parity 401/401, tree-sitter generate clean x4, gate:treesitter 96.0%.
tsc parses an interface with REPEATED extends clauses
("interface I extends A extends B {}") — the parser accepts them, the
checker reports the duplicate. Mono's single opt('extends', sep(Type,','))
clause rejected the second extends, so the construct only "parsed" by
splitting into garbage statements. many('extends', sep(Type,',')) mirrors
tsc and produces the correct interface-with-heritage tree
(parserInterfaceDeclaration1-4, interfaceThatInheritsFromItself).
Accept-neutral on the corpus (the split path already accepted these),
gates 34/34, corpus parity 401/401, gate:treesitter 96.0%; also a
prerequisite for statement-level ASI (Task #24), which otherwise rejects
these as a mid-line split.
…l augmentation

tsc's parser accepts a leading modifier before any declaration (the checker
rejects invalid combinations); mono only had piecemeal opt('async') before
function and opt('abstract') before class, so "async class C {}" /
"abstract interface I {}" only "parsed" by splitting into garbage
statements. A modifier-prefix arm [alt('async','abstract'), Decl] tried
after the dedicated arms now produces the correct modifier+declaration
tree while leaving valid "async function" / "abstract class" flat.

Also adds the two declare forms mono was missing: ambient module shorthand
"declare module \"foo\";" (no body — the module arm requires braces) and
"declare global { ... }" (global-scope augmentation; global is a
contextual-keyword block, not a namespace name).

Accept-neutral on the corpus (the old split path already accepted these
invalid-but-parseable shapes), gates 34/34, corpus parity 401/401,
gate:treesitter 96.0%. Value is CST correctness for these constructs and
as prerequisites for statement-level ASI (Task #24) — though that lever
remains a large multi-area round (measured whack-a-mole: with these
companions in place, ASI still leaves ~19 distinct tsc-accepted shapes it
breaks across regex/divide, unique-symbol, import-type-args, protected,
comma-operator, etc., so it does not land incrementally).
… false-rejects

The modifier-prefix arm accepted only async/abstract before a declaration,
so tsc-clean files leading with another modifier on a declaration
(protected class, public interface, static interface, accessor class) were
outright rejected — not even split-parsed, since protected/public/etc. are
not expression starts. tsc's parser accepts any modifier before any
declaration (the checker rejects the invalid combination). Widen the
prefix to async/abstract/public/private/protected/readonly/static/
override/accessor.

Measured over the single-file conformance corpus: false-rejects
(tsc-parser-clean files mono throws on) drop from 19 to ZERO — mono now
parses every tsc-clean single-file conformance test. Additive and
over-accept-neutral: we-accept stays 100, recall 62.4%, gates 34/34,
corpus parity 401/401, gate:treesitter 96.0%.
…essors

Two more tsc-clean shapes mono outright rejected (false-rejects):
- "class C { static const H = 1; }" — tsc parses const as a (semantically
  invalid) member modifier; add it to the class-member modifier set, where
  the not()-followed-by-name-token guard still treats "const = 1" as a
  member NAMED const.
- "var v = { get foo() }" — an object-literal accessor with no body parses
  in tsc (error recovery); the accessor body becomes opt(Block).

Both additive and over-accept-neutral: compiler-corpus false-rejects drop
28 -> 24, conformance stays 0, we-accept stays 100, recall 62.4%, gates
34/34, corpus parity 401/401, gate:treesitter 96.0%.
tsc parses index signatures more leniently than mono did (the missing
annotations/commas are checker errors): a class index signature without a
value type ("class C { [x: string]; }") and a trailing comma inside the
bracketed params of a class or type-literal index signature ("type A = {
[key: string,]: string }"). Class index-sig value type becomes optional
with an opt(',') param tail; the type-literal index branch gains the same
opt(',').

Additive, over-accept-neutral: compiler-corpus false-rejects 24 -> 21,
conformance stays 0, we-accept 100, gates 34/34, parity 401/401,
gate:treesitter 96.0%.
The ?. continuation accepted member / call / index / template forms but
not a typed call (a?.<T>(args)) — a valid TS optional-chain instantiation
that mono wrongly rejected. Adds the ['<', sep(Type), '>', '(', sep(Expr),
')'] form. Gates 34/34, parity 401/401, conformance FN stays 0, we-accept
100, gate:treesitter 96.0%.
The infrastructure for [Await]/[Yield] context-sensitive parsing via build-time
grammar name-forking (workflow-selected approach C, the only one that survives the
node-surgery reuse path — context becomes rule identity, which every reuse predicate
already keys on, so a cross-family reuse is structurally unrepresentable rather than
guarded).

- types.ts: an optional ctxMode on the transparent `group` RuleExpr, and a `canon`
  field on RuleDecl (a fork's base rule for every derived artifact).
- api.ts: awaitCtx / yieldCtx / asyncGenCtx / resetCtx combinators — transparent
  groups carrying ctxMode; every consumer but the fork transform treats them as plain
  groups, so no generator marker plumbing is needed.
- src/await-yield-fork.ts: withAwaitYield(grammar) — marker-driven multi-family rule
  closure (the reset boundary is explicit via resetCtx, open question #3), clone +
  per-family ref reroute, reserved-guard variants that forbid the context keyword,
  forks appended after base rules (rid/entry-last preserved), canon set.

NOT wired into any grammar yet (no marker uses) — a pure no-op: gates 34/34, all 7
generated outputs byte-identical. Verified the transform algorithm standalone on
synthetic grammars (correct closure, reroute, guard, no dups, per-family). Next: the
emitted-parser/cst-match canon plumbing, then route the TS/JS async/generator bodies.
…ist)

The emitted parser gains RULE_DISPLAY (RuleDecl.canon ?? name) used by ruleNameOf
and the $missing "expected X" message, while RULE_NAMES stays unique for memo/
adoption rule identity and the entry indexOf. The interpreter parser stamps a node's
`rule` field with canon ?? name the same way. So a forked rule (Block$A) reports its
base name (Block) on the green node — trees byte-identical to the base grammar — while
the distinct rule identity drives the memo/adoption key.

Identical to RULE_NAMES when no rule is forked: gates 34/34, emit==interp corpus
parity 401/401, all generated outputs byte-identical. The derived-artifact generators
(AST/TM/tree-sitter/cst-match) need fork handling only once a grammar actually forks;
deferred to the wiring step.
emitParser and the interpreter createParser now apply withAwaitYield to their input
grammar, so the [Await]/[Yield] forks live ONLY in the parser rule-identity / memo /
adoption space. The derived-artifact generators (AST / TM / tree-sitter / cst-match)
keep seeing the base grammar with the transparent-group ctx markers and so need no
fork handling for their output — the markers are invisible to them.

Verified byte-identical on typescript.ts (no ctx markers ⇒ empty closure ⇒ no forks):
the emitted parser is diff-identical before/after, gates 34/34, emit≡interp parity
401/401. cst-match's rid-space agreement + the grammar marker wiring come next.
gen-cst-match applies the same withAwaitYield fork so its rule-id space matches the
parser's tree, emits matchers/types for BASE rules only (a fork collapses to its
base), and canonicalizes the CHILD side of every rule-id check through a RULE_CANON
table (__nodeOf and the first-child dispatch switches), so a base matcher accepts a
forked child node. RULE_CANON is the identity map without ctx forks: gates 34/34
including cst-match-totality, generated outputs byte-identical.

All [Await]/[Yield] fork infrastructure is now in place and proven non-regressive
(markers + transform + parser/cst-match canon, every step a verified byte-identical
no-op). Remaining: wire the ctx markers into the JS/TS grammar (split async/non-async
arms, mark bodies+params and reset boundaries), then gate strict acceptance + the new
function<->async-function generative edit-class.
…en end-to-end)

First REAL behavioral use of the fork. javascript.ts splits the arrow arms into
async / non-async so each routes to the right rule family: an async arrow wraps its
params and body in awaitCtx (await is the operator, no identifier reading), a plain
arrow's body in resetCtx (context resets). The reserve is a `reservable` flag on the
notReservedExpr / notReserved guards (they are inline `not(alt(...))`, not rules, so
the earlier rule-fork path could not reach them); withAwaitYield's rewrite extends a
reservable guard with the family's keyword, so `await`/`yield` lose their identifier
reading inside the context. cst-match's MATCHERS_BY_ID maps a fork rid to its base
matcher, and the expected rids in __nodeOf / dispatch switches are canon-baked.

Verified: async (a = await) => 0 REJECTS, async () => await REJECTS, function f(a =
await){} ACCEPTS, async () => await x ACCEPTS, x => x / async (x) => x ACCEPT — exactly
tsc. And the whole thing holds the incremental guarantees: gates 34/34 (incl
incremental-grammars edit≡fresh over JS, cst-match-totality, emit≡interp parity),
all generated outputs byte-identical. The fork preserves window-replay because the
context IS the rule identity. Async functions / methods / generators (and typescript.ts)
are the same pattern, wired next.
typescript.ts mirrors the javascript.ts arrow split + context markers (async arrow
params/body await-context, plain arrow body reset; type params/annotations stay
plain — they are not [Await]-parameterized). `async (a = await): Promise<void> => {}`
now rejects (await needs an operand), while every valid async arrow (`async (x):
Promise<void> => await x`, `async <T>(x: T) => await x`) and non-async default
(`(a = await) => 0`, `function f(a = await){}`) still parses.

Error-recovery conformance: we-accept 100 -> 91 (the async-arrow over-accepts
cleared), recall 62.4% -> 63.3%, first-error 62.3% -> 64.8%, FN stays 0. Gates 34/34,
emit≡interp parity 401/401, byte-identical generated outputs, tree-sitter generate
clean x4, gate:treesitter 96.0%. Async functions / methods / generators next (same
pattern, more productions).
javascript.ts function-expression production 4-way split (plain / generator / async /
async-generator), each routing its params and body to the right [Await]/[Yield] family.
The await family is now correct for function expressions: `async function(){ let await=1 }`
rejects (await reserved), `async function(){ return await x }` and `async function*(){
yield await x }` parse. Valid JS unaffected (parity 0/0/0, 34/34, gate:treesitter 96.0%).

The yield family is routed (generator bodies -> $Y) but not yet fully reserved: `yield`
is a dedicated Expr arm present in every family, so `yield 1` outside a generator and
`function* g(a = yield){}` still over-accept — fixing that needs a family-conditional-arm
mechanism (next).
…bodyless-fallback escape

Two changes land the [Await]/[Yield] context across all function *declarations*
(JS + TS), not just expressions and arrows:

1. fnArms / tsFnArms helpers generate the four async×generator arms (plain / generator
   / async / async-generator) for every `function` form, routing each arm's params and
   body to its family. Applied at all six sites (JS function expr/decl/export-default,
   TS the same with type params + return type kept plain). `async` is split out of the
   Decl modifier-prefix soup with a `not('function')` guard so `async function` must
   take the context-bearing arm instead of being re-accepted as a plain function with
   a stray `async` modifier.

2. Root-cause fix for a whole class of error-masking: a TS function declaration body
   was `alt(Block, opt(';'))`, so when Block failed (e.g. an [Await]-context violation
   like `async function g(){ let await=1 }`) the parser fell through to the bodyless
   `;` signature form, parsed a zero-body declaration, and re-parsed the `{...}` as a
   separate block statement in plain context — silently accepting the error. Guarding
   the bodyless form with `not('{')` makes a present `{` commit to the Block body, so a
   body parse error stays an error. Overload/ambient signatures (`function f(): T;`)
   still parse (no `{`).

Error-recovery conformance: we-accept 91 -> 81, recall 63.30% -> 64.46%, first-error
64.79% -> 67.32%, FN stays 0. Gates 34/34, parity 0/0/0, byte-identical generated
outputs, tree-sitter generate clean x4, gate:treesitter 96.0%.
`x => …`'s shorthand parameter was a bare `Ident`, so a contextual keyword that lexes
as an identifier (`await`/`yield`) slipped through as a parameter name even inside an
[Await]/[Yield] context — the parenthesized form already routed through `notReserved`
via Param, but the shorthand bypassed it. Guard it with `notReserved` (non-async arm,
which inherits the enclosing family so `await => …` rejects nested in async params but
parses standalone) and `awaitCtx(notReserved, Ident)` (async arm, always [+Await]).

This is exactly the nested-arrow-parameter shape the spec calls out:
`async function foo(a = await => await)` and `async (a = await => await) =>` now reject
(await is the inner arrow's [+Await] parameter), while `await => await` standalone and
every ordinary arrow still parse. `async function f(a = yield => yield)` stays accepted
(async is not a generator, so yield is a valid identifier there).

Error-recovery conformance: we-accept 81 -> 74, recall 64.46% -> 65.83%, first-error
67.32% -> 69.30%, FN stays 0. Gates 34/34, parity 0/0/0, tree-sitter clean x4, 96.0%.
A class static block's statement list is [+Await] per spec (ClassStaticBlockBody :
ClassStaticBlockStatementList[~Yield, +Await, ~Return]), so `await` is reserved inside
it: `static { await; }`, `static { let await = 1; }` now reject, while a static block
with ordinary statements and a nested non-async `function f(await){}` (whose own
parameters reset to no context) still parse. Wraps the static block's Block body in
awaitCtx; decorators/modifiers on the block keep parsing (they are semantic errors).

Error-recovery conformance: we-accept 74 -> 73, recall 65.83% -> 66.67%, first-error
69.30% -> 69.58%, FN stays 0. Gates 34/34, parity 0/0/0, tree-sitter clean x4, 96.0%.
…lenient

tsc's PARSER accepts await/yield as binding identifiers even inside an async/generator
body (`async function f(){ let await = 1 }`, `function* g(){ function yield(){} }`) —
the "reserved word" rule there is a checker diagnostic, not a parse error. Only at
EXPRESSION position does tsc reject, because `await` must be the operator and so needs
an operand (`await;`, `await =>`, `a = await` -> "Expression expected").

The earlier fork made `notReserved` (the binding guard) reservable too, which
false-rejected those lenient bindings. Drop that: only `notReservedExpr` (the
expression identifier-NUD guard) carries the [Await]/[Yield] reservation, and the
single-identifier arrow parameter now guards with `notReservedExpr` so `await => x`
rejects in an await context via the same operator-needs-operand path tsc uses (it
parses the arrow head as an expression first), while `let await`/`var yield`/named
`function yield(){}` parse everywhere.

Bidirectional over the single-file conformance corpus: false-rejects of tsc-accepted
files drop (the await/yield-binding FN, asyncOrYieldAsBindingIdentifier1, is gone);
over-accepts unchanged (they were always expression-position). recovery-conformance
recall 66.35%, first-error 69.58%, we-accept 73. Gates 34/34, parity 0/0/0, 96.0%.
… async

Class members and object-literal properties now route method params/bodies to their
[Await]/[Yield] family instead of leaking the enclosing context: plain methods,
constructors, accessors and field initializers reset (a method body has its OWN,
non-inherited context — the spec's implicit function boundary), generators yield,
async await, async-generators both. A computed key `[e]` stays OUTSIDE the family (it
is evaluated in the enclosing context), so `class C { [await](){} }` inside async still
rejects while the method bodies don't.

`async` is pulled out of the member modifier soup into dedicated arms (the class analog
of the Decl/arrow fix) so the body gets its await context — but tsc parses `async` as an
ORDER-FREE modifier (`async static m`, `override async m`, `async get x`, `async static
{}` all parse, the checker validates), so each async arm carries its own inner
many(Modifier) run and there are async-accessor / async-static-block arms. The `static`
modifier's `not('{')`-style guard keeps `async static {}` parsing the block, not eating
`static` as a modifier.

This closes the class-body context leak: `async function f(){ class C { m(){ await; } } }`
and `{ x = await }` field initializers now parse (method/initializer reset), matching
tsc's parser; over the single-file conformance corpus the await/yield false-rejects are
gone (FN drops to 2 pre-existing externalModules import-feature cases, unrelated). Async
methods reject `await;`/`await =>` like async functions do.

recovery-conformance unchanged at recall 66.35%, first-error 69.58%, we-accept 73 (the
method await cases were never in the single-file set). Gates 34/34, parity 0/0/0,
byte-identical generated outputs, tree-sitter generate clean x4, gate:treesitter 96.0%.
The random mutator only hits an async/generator toggle by luck, yet that edit is the
whole reason the context is a build-time name-fork rather than a runtime flag: flipping
`async`/`*` on an enclosing function changes its body's RULE IDENTITY (Block ->
Block$A/$Y/$AY), and a runtime flag read by core() but absent from the reuse key would
let a stale cross-family row survive. This adds a scripted edit class over hand-authored
async/generator documents — drop/re-add `async`, drop a generator `*`, edit an async
arrow's params, a yield operand, a class method's async/`*` — interleaved with a
surgery-path in-body keystroke, asserting each stays edit≡fresh + self-consistent.

706/706 steps equal+consistent across all 7 grammars: the name-fork preserves the
window-replay theorem verbatim under exactly the edits it exists to survive.
A `using` / `await using` declaration binds a plain BindingIdentifier, never a pattern.
UsingBinding replaces the pattern-allowing Binding/ForBinding in the using arms, so
`using [a] = null` falls through to the expression `using[a] = null` — which is exactly
how tsc reads sync `using` in statement position (it is a contextual identifier there),
so the tree now matches instead of minting a bogus using-declaration with a pattern.

The `await using [a]` parse-error tsc reports is NOT cleared by this alone: it is
statement-ASI-gated — mono still splits `await using` off `[a] = null` into two
statements (the Task #24 gap), so the over-accept stands until the ASI round, which this
identifier-only binding is a prerequisite for (the await-using arm must reject the
pattern once ASI stops the split). Accept-neutral: recovery-conformance unchanged
(we-accept 73, recall 66.35%, first-error 69.58%), 34/34, parity 0/0/0, tree-sitter
clean, gate:treesitter 96.0%.
UsingBinding cleared no over-accept — `await using [a] = null` over-accepts via
STATEMENT-SPLITTING (the ASI gap, #24): mono splits `await using` off `[a] = null`
into two statements regardless of the binding shape (proven: `x using [a]` splits the
same way). So an identifier-only using binding only shuffles trees tsc rejects anyway,
and it introduced a tree-sitter GLR conflict (`using x: T <` vs a generic type) — 9c04bc0
committed the stale grammar.js because the `tree-sitter generate` failure was swallowed
by the `|| echo FAIL` in the gate chain.

The identifier-only using binding + an `await using [` ExpressionStatement commit guard
are the correct fix, but they only clear the over-accept once ASI stops the split, so
they belong with the ASI round (#24), not as a standalone companion that adds a GLR
conflict for zero acceptance gain. Restores Binding in the using arms; 34/34, parity
0/0/0, tree-sitter generate clean x4, gate:treesitter 96.0%.
… new false-rejects

The TS statement terminator becomes asi() = alt([';'], [not(sameLine)], [not(not('}'))])
on every Stmt-level arm (var/let/const, return, throw, break, continue, debugger, using,
expression statement): a statement may end only at ';', a line-terminator before the next
token, or a closing '}'. A same-line non-';'/'}' token can no longer terminate it, so the
mid-line splits mono used to accept by exploiting the optional ';' (`var x = a[]` split
into `var x=a` + `[]`) now stay one statement and reject like tsc.

asi alone false-rejects every tsc-clean construct that legitimately continues a statement
without a ';'. A multi-agent workflow mapped the full set (41 single-file conformance
cases) to 11 companions — each a MISSING production asi merely EXPOSED (base only
"accepted" them via the same split it removes), so every fix lives in the arm asi
exposed, never in asi() itself:
  - per-specifier `type` modifier on import/export specifiers, with tsc's multi-token
    `{ type as as B }` / `{ type type as foo }` disambiguation
  - `export type *` / `export type * as ns from` + ModuleExportName namespace alias
  - `import type X = require()` (type-only import-equals; two arms so `import type = …`
    keeps `type` as the binding name)
  - interface heritage via the shared heritageClauses helper (implements / `extends Foo?.Bar`
    / empty `extends {` / self / repeated)
  - leading modifier soup before var/let/const/using (mirrors the decorator-prefix arm)
  - nested `new new Foo()()` (recursive NewTarget; + ['new_target'] tree-sitter conflict)
  - `export as namespace X` + `export default interface`
  - `import<T>(...)` instantiation expression
  - regex flag tail = maximal-munch IdentifierPart run (tsc lexes flags leniently)
  - non-null `!` is a restricted (no-line-break) postfix, like `++`/`--`
  - `unique` as a general prefix type operator (`unique <Type>`)

The workflow's const/var->notReservedExpr companion was MEASURED net-negative (it
regresses `for (var of X)` + `[...x = a]`, both tsc-parse-clean) and dropped; its lone
target (importWithTypeArguments) is covered by the import<T> arm instead.

recovery-conformance: we-accept 73 -> 50 (-23 mid-line-split over-accepts), recall
66.35% -> 69.82%, first-error 69.58% -> 74.37% (precision dips 84% -> 67% as mono now
REPORTS errors on the 23 newly-rejected files at a coarser granularity than tsc — the
known recovery-granularity gap, not new false-rejects: bidirectional FN stays 2, both
pre-existing externalModules import-feature cases). recovery.ts VALID fixture swapped
parserRealSource7 (a tsc PARSE-ERROR file that only passed via the split bug) ->
parserRealSource12. Gates 34/34, parity 0/0/0, tree-sitter generate clean x4,
gate:treesitter 96.0%.
…t type-args

Two CFG/lexer-landable over-accepts from the 50-file triage (workflow mapped 49 landable
/ 2 semantic-ceiling):

1. numeric-literal-lex (10 files): a decimal integer part is a single `0` or a `[1-9]`-led
   run — `0` immediately followed by a digit (legacy octal `0123`, leading-zero `09`) is
   not a decimal literal. intPart='0' lets the trailing digit trip numericTailGuard so the
   token fails and the total lexer rejects it (tsc's scanner behavior). fracTail/expTail/
   BigInt keep `digits` (leading zeros legal: `0.012`, `1e007`, `0n`); radix tokens
   untouched. `0`, `0.5`, `0e1`, `1_000`, `0x1f` stay valid.

2. type-arg-sameLine (1 file): generic type-argument application `T<A>` is newline-
   sensitive — `T\n<A>` rejects, mirroring the existing `[$, sameLine, '[']` / `!` postfix
   type arms.

recovery-conformance: we-accept 50 -> 39, recall 69.82% -> 72.77%, first-error 74.37% ->
77.75%, precision ~stable. Bidirectional FN 0 (handle API). Gates 34/34, parity 0/0/0,
tree-sitter generate clean x4, gate:treesitter 96.0%.
…g separator

Four more CFG-landable over-accepts from the 50-file triage:
- `let [` at statement start commits to a LexicalDeclaration (added to the
  expression-statement lookahead guard), so a bad `let [...]` head rejects instead of
  parsing as `let`-indexed expression.
- `new <T>Foo()` rejects: a `<` may not directly follow `new` (the operand is a
  MemberExpression) — `not('<')` on the `new` arms; post-callee `Foo<T>()` type-args stay.
- a labeled-statement / for-binding-property label is `notReserved` (a reserved word can
  never be an Identifier-slot label).
- a class index-signature ends with the asi() member terminator (`; / newline / }`),
  not a bare optional `;`, so a same-line adjacent member rejects.

(The type-literal member separator was tried in the same asi() shape but REVERTED: it
regresses `var x: { private y: string }` — tsc reads `private y` as two lenient members
with no separator, which requires TypeMember modifier support, a separate change.)

recovery-conformance we-accept 39 -> 36, FN 0 (handle API). Gates 34/34, parity 0/0/0,
tree-sitter generate clean x4.
…ntextual name)

A type-parameter NAME guards through `notReserved, Ident`. `in` LEXES as an Ident, so an
un-guarded Ident wrongly accepted it as the name — but `in` is a reserved word there:
tsc rejects `<in>` / `<in in>` / `<out in>` / `<in = any>` ("'in' is a reserved word that
cannot be used here") while accepting `<out>` / `<out out>` / `<in out>` (out is a
contextual keyword, a valid name) and every modifier use (`<in T>` / `<in out T>` /
`<const T>` — `in` stays a variance modifier). Guards all three TypeParam arms (the
modifier-soup arm's name too, since `many(mod)` greedily eats trailing `in`s).

test/refactor-guard.ts had codified the old over-accept: its SHOULD-PASS `tp name-in
default` = `interface I<in = any> {}` is a tsc PARSE ERROR — corrected to the valid
`out` analog `interface I<out = any> {}`.

recovery-conformance we-accept 36 -> 35, FN 0. incremental-grammars 706/706 (the tripwire
that rejected the super-primary attempt — this one keeps edit≡fresh). Gates 34/34, refactor
-guard 112/112, tree-sitter generate clean x4.
`npm run check` ran its 35 gates strictly serially (execFileSync in a for-loop), so the
wall-clock was the SUM of every gate. Each gate is an independent subprocess that emits
its own parser and reads its own corpus, sharing no mutable state and writing DISTINCT
/tmp/emitted-*.mjs files — so they parallelize safely. A (cpus-2)-wide worker pool turns
the wall-clock into ~max(sum/pool, slowest-gate): measured 19.4s (was minutes), now bound
by the single slowest gate (exhaustive-edits ~18s). Results stream as each finishes; the
final pass/fail summary prints in gate order and the exit code is unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error-tolerant parsing mode: a live tree through transiently-invalid edit states

1 participant