Skip to content

Roll back composite sub-handlers when one rejects peer_connected#4595

Merged
TheBlueMatt merged 1 commit into
lightningdevkit:mainfrom
tnull:2026-05-composite-handler-peer-connected-rollback
May 6, 2026
Merged

Roll back composite sub-handlers when one rejects peer_connected#4595
TheBlueMatt merged 1 commit into
lightningdevkit:mainfrom
tnull:2026-05-composite-handler-peer-connected-rollback

Conversation

@tnull

@tnull tnull commented May 5, 2026

Copy link
Copy Markdown
Contributor

composite_custom_message_handler! expanded peer_connected to call every sub-handler and remember the last error, but never undo the already-succeeded ones. The CustomMessageHandler::peer_connected contract is that PeerManager will not invoke peer_disconnected when peer_connected returns Err — so any per-peer state allocated by an earlier sub-handler that returned Ok was leaked permanently once a later sub-handler returned Err.

A peer who can elicit Err from any sub-handler in the composite (feature-bit gate, banlist, etc.) could repeatedly reconnect to grow that leaked state without bound (slow resource DoS), and "currently connected" predicates in the leaking sub-handler would lie about peers that were actually rejected.

Mirror the rollback pattern PeerManager already uses for the four built-in handlers (peer_handler.rs:2149-2188): record each sub-handler's peer_connected result, and if any returned Err, call peer_disconnected on the ones that succeeded before propagating the failure.

Co-Authored-By: HAL 9000

@tnull tnull requested a review from jkczyz May 5, 2026 19:16
@ldk-reviews-bot

ldk-reviews-bot commented May 5, 2026

Copy link
Copy Markdown

👋 Thanks for assigning @jkczyz as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@ldk-claude-review-bot

ldk-claude-review-bot commented May 5, 2026

Copy link
Copy Markdown
Collaborator

No issues found.

I reviewed the entire diff thoroughly:

  • Macro logic: The rollback correctly calls all sub-handlers, records results, and on any failure calls peer_disconnected only on handlers that returned Ok. This matches the peer_connected/peer_disconnected trait contract (line 109 of peer_handler.rs).
  • Macro hygiene: any_err (macro-definition context) and $field (call-site context) are in different hygiene scopes — no collision risk.
  • Variable shadowing: let $field = self.$field.peer_connected(...) shadows the field name locally, but self.$field still correctly accesses the struct field for the peer_disconnected rollback call.
  • Edge cases: Single handler (rollback check is a no-op since the one that failed won't pass is_ok()), zero handlers (always returns Ok(())).
  • Test coverage: Correctly verifies rollback via atomic counter, and debug_assert!(false) in ErroringHandler::peer_disconnected catches any incorrect rollback of the failing handler.

Comment thread lightning-custom-message/src/lib.rs Outdated
@codecov

codecov Bot commented May 5, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 37.64706% with 53 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.11%. Comparing base (1a26867) to head (5455058).
⚠️ Report is 22 commits behind head on main.

Files with missing lines Patch % Lines
lightning-custom-message/src/lib.rs 37.64% 51 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4595      +/-   ##
==========================================
- Coverage   86.84%   86.11%   -0.73%     
==========================================
  Files         161      157       -4     
  Lines      109260   108772     -488     
  Branches   109260   108772     -488     
==========================================
- Hits        94882    93668    -1214     
- Misses      11797    12487     +690     
- Partials     2581     2617      +36     
Flag Coverage Δ
fuzzing-fake-hashes ?
fuzzing-real-hashes ?
tests 86.11% <37.64%> (-0.11%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jkczyz jkczyz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from needing to add the debug_assert.

`composite_custom_message_handler!` expanded `peer_connected` to call
every sub-handler and remember the last error, but never undo the
already-succeeded ones. The `CustomMessageHandler::peer_connected`
contract is that `PeerManager` will *not* invoke `peer_disconnected`
when `peer_connected` returns `Err` — so any per-peer state allocated
by an earlier sub-handler that returned `Ok` was leaked permanently
once a later sub-handler returned `Err`.

A peer who can elicit `Err` from any sub-handler in the composite
(feature-bit gate, banlist, etc.) could repeatedly reconnect to grow
that leaked state without bound (slow resource DoS), and "currently
connected" predicates in the leaking sub-handler would lie about
peers that were actually rejected.

Mirror the rollback pattern `PeerManager` already uses for the four
built-in handlers (`peer_handler.rs:2149-2188`): record each
sub-handler's `peer_connected` result, and if any returned `Err`,
call `peer_disconnected` on the ones that succeeded before
propagating the failure.

Co-Authored-By: HAL 9000
Signed-off-by: Elias Rohrer <dev@tnull.de>
@tnull tnull force-pushed the 2026-05-composite-handler-peer-connected-rollback branch from 7de6891 to 5455058 Compare May 6, 2026 09:05
@tnull tnull requested review from TheBlueMatt and jkczyz May 6, 2026 09:06
@TheBlueMatt TheBlueMatt merged commit 416dfad into lightningdevkit:main May 6, 2026
23 of 25 checks passed
@TheBlueMatt

Copy link
Copy Markdown
Collaborator

Backported to 0.1 in #4680.

@TheBlueMatt

Copy link
Copy Markdown
Collaborator

Backported to 0.2 in #4683.

TheBlueMatt added a commit that referenced this pull request Jun 19, 2026
v0.1.10 - Jun 18, 2026 - "Loupe de Loupe"

API Updates
===========

 * `DefaultMessageRouter` will now always generate blinded message paths that
   provide no privacy (where our node is the introduction node) for nodes with
   public channels. This works around an issue which will appear for any nodes
   with LND peers that enable onion messaging - such peers will refuse to
   forward BOLT 12 messages from unknown third parties, which most BOLT 12
   payers rely on today (#4647).
 * Explicit `amount_msats` of 0 is rejected in BOLT 12 `Offer`s; `OfferBuilder`
   now maps 0-amounts to an amount of `None` (#4324).

Bug Fixes
=========

 * Async `ChannelMonitorUpdate` persistence operations which complete, but are
   not marked as complete in a persisted `ChannelManager` prior to restart,
   followed immediately by a block connection and then another restart could
   result in some channel operations hanging leading for force-closures (#4377).
 * If an MPP payment is claimed but `ChannelMonitorUpdate`s for some parts are
   still being completed asynchronously, further channel updates (e.g.
   forwarding another payment) are pending and the node restarts, the channel
   could have become stuck (#4520).
 * The presence of unconfirmed transactions actually no longer causes
   `ElectrumSyncClient` to spuriously fail to sync (#4590).
 * `FilesystemStore::list_all_keys` will no longer fail if there are stale
   intermediate files lying around from a previous unclean shutdown (#4618).
 * When forwarding an HTLC while in a blinded path with proportional fees over
   200%, LDK will no longer spuriously allow a forward that pays us 1 msat too
   little in fees (#4697).
 * Fixed a rare case where a channel could get stuck on reconnect when using
   both async `ChannelMonitorUpdate` persistence and async signing (#4684).
 * `Event::PaymentSent::fee_paid_msat` is no longer `None` in cases where
   `ChannelManager::abandon_payment` was called before the payment ultimately
   completes anyway (#4651).
 * Syncing a `ChainMonitor` using the `Confirm` trait will no longer write some
   full `ChannelMonitor`s to disk several times per block (#4544).
 * `OMDomainResolver` now correctly accounts for failed queries when rate
   limiting, ensuring we continue to respond to queries after failures (#4591).
 * Calling `ChannelManager::send_payment_with_route` without a `route_params`
   and with an invalid `Route` will no longer panic (#4707).
 * `lightning-custom-message`'s handling of `peer_connected` events now ensures
   that sub-handlers will see a `peer_disconnected` event if a different
   sub-handler refused the connection by `Err`ing `peer_connected` (#4595).
 * Incomplete MPP keysend payments will no longer see their HTLCs held until
   expiry (#4558).
 * `InvoiceRequestBuilder` will no longer accept a `quantity` of `0` for a
   BOLT 12 `Offer`, allowing any quantity up to a bound (#4667).
 * `lightning-custom-message` handlers that return `Ok(None)` when asked to
   deserialize a message in their defined range no longer cause panics (#4709).
 * Several spurious debug assertions were fixed (#4537, #4618).

Security
========

0.1.10 fixes a sanitization issue and several denial-of-service vulnerabilities.
 * `Bolt11Invoice::recover_payee_pub_key` no longer panics if called on an
   invoice which set an explicit public key, rather than relying on public key
   recovery. This method is called from `payment_parameters_from_invoice` and
   `payment_parameters_from_variable_amount_invoice` (#4717).
 * Maliciously-crafted unpayable invoices which have overflowing feerates will
   no longer cause an `unwrap` failure panic (#4716).
 * `possiblyrandom` did not properly generate random data except when it was
   explicitly configured to. By default this means LDK is vulnerable to various
   HashDoS attacks (#4719).
 * `OMNameResolver` will no longer panic when looking up payment instructions
   which include unicode characters at the start of a TXT record (#4718).
 * `PrintableString` did not properly sanitize unicode format characters,
   allowing an attacker to corrupt the rendering of logs or UI (#4593, #4605).
 * RGS data is now limited in how large of a graph it is able to cause a client
   to store in memory. Note that RGS data is still considered a DoS vector in
   general and you should only use semi-trusted RGS data (#4713).
 * Counterparty-provided strings in failure messages are no longer logged in
   full, reducing the ability of such a counterparty to spam our logs (#4714).
 * Reading a corrupted `ChannelManager` or `ProbabilisticScorer` can no longer
   cause us to allocate large amounts of memory (#4712).

Thanks to Project Loupe for reporting most of the issues fixed in this release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants