Attempt to unblock blocked monitor updates on startup by TheBlueMatt · Pull Request #4520 · lightningdevkit/rust-lightning

TheBlueMatt · 2026-03-30T01:15:33Z

When we make an MPP claim we push RAA blockers for each chanel to ensure we don't allow any single channel to make too much progress until all channels have the preimage durably on disk. We don't have to store those RAA blockers on disk in the ChannelManager as there's no point - if the ChannelManager gets to disk with the RAA blockers it also brought with it the pending ChannelMonitorUpdates that contain the preimages and will now be replayed, ensuring the preimage makes it to all ChannelMonitors.

However, just because those RAA blockers dissapear on reload doesn't mean the implications of them does too - if a later ChannelMonitorUpdate was blocked in the channel we don't have logic to unblock it on startup.

Here we add such logic, simply attempting to unblock all blocked ChannelMonitorUpdates that existed on startup.

Code written by Claude.

Fixes #4518

Needs a test, I have a start on one but need to clean it up

ldk-reviews-bot · 2026-03-30T01:15:36Z

👋 I see @tankyleo was un-assigned.
If you'd like another reviewer assignment, please click here.

joostjager · 2026-05-06T11:11:32Z

There was a unit test already in issue linked in the description. On branch main...joostjager:2026-03-startup-mpp-unblock-with-test, it's added as a commit, and shows that your fix makes the test pass.

joostjager · 2026-05-06T13:00:20Z

Discussed offline and improved unit test by ignoring some ignored read outs and 'loops until settled'.

When we make an MPP claim we push RAA blockers for each chanel to ensure we don't allow any single channel to make too much progress until all channels have the preimage durably on disk. We don't have to store those RAA blockers on disk in the ChannelManager as there's no point - if the ChannelManager gets to disk with the RAA blockers it also brought with it the pending ChannelMonitorUpdates that contain the preimages and will now be replayed, ensuring the preimage makes it to all ChannelMonitors. However, just because those RAA blockers dissapear on reload doesn't mean the implications of them does too - if a later ChannelMonitorUpdate was blocked in the channel we don't have logic to unblock it on startup. Here we add such logic, simply attempting to unblock all blocked `ChannelMonitorUpdate`s that existed on startup. Code written by Claude. Fixes lightningdevkit#4518

TheBlueMatt · 2026-05-06T20:38:52Z

Thanks! Rebased and included the test.

ldk-claude-review-bot · 2026-05-06T20:47:33Z

Good — takes msgs::UpdateFulfillHTLC by value, so the clone() in the test is needed and correct.

I've done a thorough review of the implementation and test. My prior review was correct — no issues found.

No issues found.

The implementation is correct:

New BackgroundEvent::AttemptUnblockMonitorUpdates variant — all pattern matches across the codebase are exhaustive. The variant is never serialized (background events are hardcoded to count 0 at line 18223).
Deserialization logic (lines 19471-19477) — correctly queued after MonitorUpdateRegeneratedOnStartup/MonitorUpdatesComplete events, ensuring preimage replays are processed first. Only triggers for channels with blocked_monitor_updates_pending() > 0.
Event processing (lines 8803-8808) — handle_monitor_update_release(counterparty_node_id, channel_id, None) safely gates on raa_monitor_updates_held(), which correctly checks both actions_blocking_raa_monitor_updates (empty on fresh startup) and pending_events for any ReleaseRAAChannelMonitorUpdate completion actions that may have been created during the same process_background_events pass. If a PaymentClaimed event with such an action exists, the blocked updates are held until the user processes it; otherwise they're released immediately.
Assertion at line 9765 — AttemptUnblockMonitorUpdates { .. } => false is correct; this debug assertion checks for preimage replay events for the claiming channel.
Test — comprehensive coverage of the full cycle: two-channel MPP claim with asymmetric monitor state on reload, verifying both the startup-released fulfill (channel A) and the event-completion-released fulfill (channel B).

codecov · 2026-05-06T22:18:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.14%. Comparing base (1d36f7b) to head (01d55dc).
⚠️ Report is 76 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4520      +/-   ##
==========================================
- Coverage   86.16%   86.14%   -0.03%     
==========================================
  Files         156      157       +1     
  Lines      108669   108958     +289     
  Branches   108669   108958     +289     
==========================================
+ Hits        93638    93857     +219     
- Misses      12420    12484      +64     
- Partials     2611     2617       +6

Flag	Coverage Δ
tests	`86.14% <100.00%> (-0.03%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

joostjager

Even though the fix is just a few lines, it is still hard to fully understand all the consequences. Buthandle_monitor_update_release called with None seems safe with its double-check, and it fixes the bug.

joostjager · 2026-05-07T07:25:14Z

 }

+#[test]
+fn test_mpp_claim_htlc_fulfills_unblocked_on_reload() {


I think you picked the commit message from the first commit on my branch where the test was bug-reproducing ("verifies the bug leaves an htlc stuck"), and combined it with the final test.

Oops, tweaked the commit message. Looks like we didn't get a second reviewer on this so hit the button.

ldk-reviews-bot · 2026-05-07T07:34:40Z

👋 The first review has been submitted!

Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer.

joostjager · 2026-05-07T11:28:35Z

Additional verification of this PR: #4601

ldk-reviews-bot · 2026-05-12T13:02:45Z

✅ Added second reviewer: @tankyleo

Add a characterization test for a claimed MPP payment whose preimage monitor updates are only partially persisted before restart. The test drives both channels through a held fee-update commitment dance, claims with async monitor persistence, reloads one fresh and one stale monitor, and verifies that we don't leave a sender-side HTLC stuck after reconnect.

Implements cross-version serialization testing in chanmon_consistency fuzzer by depending on lightning_0_2. Test harnesses now implement both current and v0.2 traits, and reload_node performs round-trip testing. Discovered bug lightningdevkit#4518 (fixed in lightningdevkit#4520). Addresses lightningdevkit#4452

TheBlueMatt · 2026-06-11T01:05:42Z

Backported to 0.1 in #4680.

TheBlueMatt · 2026-06-11T20:52:34Z

Backported to 0.2 in #4683.

v0.1.10 - Jun 18, 2026 - "Loupe de Loupe" API Updates =========== * `DefaultMessageRouter` will now always generate blinded message paths that provide no privacy (where our node is the introduction node) for nodes with public channels. This works around an issue which will appear for any nodes with LND peers that enable onion messaging - such peers will refuse to forward BOLT 12 messages from unknown third parties, which most BOLT 12 payers rely on today (#4647). * Explicit `amount_msats` of 0 is rejected in BOLT 12 `Offer`s; `OfferBuilder` now maps 0-amounts to an amount of `None` (#4324). Bug Fixes ========= * Async `ChannelMonitorUpdate` persistence operations which complete, but are not marked as complete in a persisted `ChannelManager` prior to restart, followed immediately by a block connection and then another restart could result in some channel operations hanging leading for force-closures (#4377). * If an MPP payment is claimed but `ChannelMonitorUpdate`s for some parts are still being completed asynchronously, further channel updates (e.g. forwarding another payment) are pending and the node restarts, the channel could have become stuck (#4520). * The presence of unconfirmed transactions actually no longer causes `ElectrumSyncClient` to spuriously fail to sync (#4590). * `FilesystemStore::list_all_keys` will no longer fail if there are stale intermediate files lying around from a previous unclean shutdown (#4618). * When forwarding an HTLC while in a blinded path with proportional fees over 200%, LDK will no longer spuriously allow a forward that pays us 1 msat too little in fees (#4697). * Fixed a rare case where a channel could get stuck on reconnect when using both async `ChannelMonitorUpdate` persistence and async signing (#4684). * `Event::PaymentSent::fee_paid_msat` is no longer `None` in cases where `ChannelManager::abandon_payment` was called before the payment ultimately completes anyway (#4651). * Syncing a `ChainMonitor` using the `Confirm` trait will no longer write some full `ChannelMonitor`s to disk several times per block (#4544). * `OMDomainResolver` now correctly accounts for failed queries when rate limiting, ensuring we continue to respond to queries after failures (#4591). * Calling `ChannelManager::send_payment_with_route` without a `route_params` and with an invalid `Route` will no longer panic (#4707). * `lightning-custom-message`'s handling of `peer_connected` events now ensures that sub-handlers will see a `peer_disconnected` event if a different sub-handler refused the connection by `Err`ing `peer_connected` (#4595). * Incomplete MPP keysend payments will no longer see their HTLCs held until expiry (#4558). * `InvoiceRequestBuilder` will no longer accept a `quantity` of `0` for a BOLT 12 `Offer`, allowing any quantity up to a bound (#4667). * `lightning-custom-message` handlers that return `Ok(None)` when asked to deserialize a message in their defined range no longer cause panics (#4709). * Several spurious debug assertions were fixed (#4537, #4618). Security ======== 0.1.10 fixes a sanitization issue and several denial-of-service vulnerabilities. * `Bolt11Invoice::recover_payee_pub_key` no longer panics if called on an invoice which set an explicit public key, rather than relying on public key recovery. This method is called from `payment_parameters_from_invoice` and `payment_parameters_from_variable_amount_invoice` (#4717). * Maliciously-crafted unpayable invoices which have overflowing feerates will no longer cause an `unwrap` failure panic (#4716). * `possiblyrandom` did not properly generate random data except when it was explicitly configured to. By default this means LDK is vulnerable to various HashDoS attacks (#4719). * `OMNameResolver` will no longer panic when looking up payment instructions which include unicode characters at the start of a TXT record (#4718). * `PrintableString` did not properly sanitize unicode format characters, allowing an attacker to corrupt the rendering of logs or UI (#4593, #4605). * RGS data is now limited in how large of a graph it is able to cause a client to store in memory. Note that RGS data is still considered a DoS vector in general and you should only use semi-trusted RGS data (#4713). * Counterparty-provided strings in failure messages are no longer logged in full, reducing the ability of such a counterparty to spam our logs (#4714). * Reading a corrupted `ChannelManager` or `ProbabilisticScorer` can no longer cause us to allocate large amounts of memory (#4712). Thanks to Project Loupe for reporting most of the issues fixed in this release.

TheBlueMatt added backport 0.1 backport 0.2 labels Mar 30, 2026

TheBlueMatt mentioned this pull request Apr 6, 2026

fuzz: Add upgrade/downgrade simulation to chanmon_consistency and fix chacha20 build #4499

Closed

This was referenced Apr 15, 2026

fuzz: cover deferred writing in chanmon_consistency #4465

Merged

MPP claim HTLC fulfills stuck in holding cell after node restart #4518

Closed

TheBlueMatt force-pushed the 2026-03-startup-mpp-unblock branch from a38acca to 6977e25 Compare May 6, 2026 20:38

TheBlueMatt marked this pull request as ready for review May 6, 2026 20:38

ldk-reviews-bot requested a review from joostjager May 6, 2026 20:39

TheBlueMatt force-pushed the 2026-03-startup-mpp-unblock branch from 6977e25 to 52a0030 Compare May 6, 2026 21:01

joostjager approved these changes May 7, 2026

View reviewed changes

joostjager mentioned this pull request May 7, 2026

fuzz: add chanmon stuck HTLC invariant #4601

Merged

ldk-reviews-bot requested a review from tankyleo May 12, 2026 13:02

TheBlueMatt force-pushed the 2026-03-startup-mpp-unblock branch from 52a0030 to 01d55dc Compare May 12, 2026 13:13

tankyleo requested review from valentinewallace and removed request for tankyleo May 12, 2026 18:03

valentinewallace approved these changes May 14, 2026

View reviewed changes

valentinewallace merged commit 090930c into lightningdevkit:main May 14, 2026
23 of 24 checks passed

Atishyy27 mentioned this pull request May 19, 2026

fuzz: Add cross-version ChannelMonitor roundtrip coverage #4622

Open

TheBlueMatt mentioned this pull request Jun 11, 2026

[0.1] Initial round of backports for 0.1.10 #4680

Merged

TheBlueMatt removed the backport 0.1 label Jun 11, 2026

TheBlueMatt mentioned this pull request Jun 11, 2026

[0.2] Initial batch of 0.2.3 backports #4683

Merged

TheBlueMatt removed the backport 0.2 label Jun 11, 2026

Conversation

TheBlueMatt commented Mar 30, 2026

Uh oh!

ldk-reviews-bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joostjager commented May 6, 2026

Uh oh!

joostjager commented May 6, 2026

Uh oh!

TheBlueMatt commented May 6, 2026

Uh oh!

ldk-claude-review-bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov Bot commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

joostjager left a comment

Choose a reason for hiding this comment

Uh oh!

joostjager May 7, 2026

Choose a reason for hiding this comment

Uh oh!

TheBlueMatt May 12, 2026

Choose a reason for hiding this comment

Uh oh!

ldk-reviews-bot commented May 7, 2026

Uh oh!

joostjager commented May 7, 2026

Uh oh!

ldk-reviews-bot commented May 12, 2026

Uh oh!

Uh oh!

TheBlueMatt commented Jun 11, 2026

Uh oh!

TheBlueMatt commented Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ldk-reviews-bot commented Mar 30, 2026 •

edited

Loading

ldk-claude-review-bot commented May 6, 2026 •

edited

Loading

codecov Bot commented May 6, 2026 •

edited

Loading