Clear monitor-pending RAA once regenerated#4684
Conversation
|
👋 Thanks for assigning @TheBlueMatt as a reviewer! |
|
No new issues found. The production change (
Minor non-blocking observation (not a code bug): the PR description and inline comment frame the fix as happening "whenever The symmetric |
|
🔔 1st Reminder Hey @TheBlueMatt! This PR has been waiting for your review. |
|
Why does this not need backport to 0.1/0.2? |
TheBlueMatt
left a comment
There was a problem hiding this comment.
I'm a bit confused here, why is it safe to always send an RAA (via get_last_revoke_and_ack) based on only signer_pending_... or monitor_pending_...? eg if we reconnect and find that we owe a commitment_signed that is blocked on the signer but have a blocked revoke_and_ack on a monitor update, we'll set signer_pending_raa and then send it if the signer completes even if the monitor is pending.
|
👋 The first review has been submitted! Do you think this PR is ready for a second reviewer? If so, click here to assign a second reviewer. |
The `chanmon_consistency` fuzz target found a reconnect ordering where `signer_pending_revoke_and_ack` and `monitor_pending_revoke_and_ack` could both describe the same owed `revoke_and_ack`. The channel first received a `commitment_signed` whose monitor update completed, but the signer could not provide the next point or secret, leaving `signer_pending_revoke_and_ack` set. Later, receiving the peer `revoke_and_ack` freed holding-cell HTLCs and produced a held monitor update. While that monitor update was still blocked, `channel_reestablish` saw the peer one state behind and recorded `monitor_pending_revoke_and_ack`, plus the corresponding monitor-pending `commitment_signed`, so the messages could be replayed once monitor updating was restored. If the signer unblocked before the held monitor update was released, `signer_maybe_unblocked` generated and sent the already monitor-safe RAA using `signer_pending_revoke_and_ack`. The monitor-pending flag was not cleared at that point, so `monitor_updating_restored` later generated the same RAA again when the held update completed. The peer had already advanced after accepting the signer-unblocked RAA, so it rejected the duplicate secret as not corresponding to its current pubkey and force-closed. Fix this by clearing `monitor_pending_revoke_and_ack` in the signer-resume path only once a signer-pending RAA is actually being returned.
38f6df2 to
27223fd
Compare
I think we can, though I do wonder why this went uncaught for so long if it actually was an issue in those releases as well. Something about our current fuzz harness made this much easier to find. EDIT: It's reachable in the fuzzer now because there's a path to reload with a stale manager. |
TheBlueMatt
left a comment
There was a problem hiding this comment.
This is a trivial fix.
|
Backported in #4706 |
|
Backported to 0.1 in #4710. |
v0.1.10 - Jun 18, 2026 - "Loupe de Loupe" API Updates =========== * `DefaultMessageRouter` will now always generate blinded message paths that provide no privacy (where our node is the introduction node) for nodes with public channels. This works around an issue which will appear for any nodes with LND peers that enable onion messaging - such peers will refuse to forward BOLT 12 messages from unknown third parties, which most BOLT 12 payers rely on today (#4647). * Explicit `amount_msats` of 0 is rejected in BOLT 12 `Offer`s; `OfferBuilder` now maps 0-amounts to an amount of `None` (#4324). Bug Fixes ========= * Async `ChannelMonitorUpdate` persistence operations which complete, but are not marked as complete in a persisted `ChannelManager` prior to restart, followed immediately by a block connection and then another restart could result in some channel operations hanging leading for force-closures (#4377). * If an MPP payment is claimed but `ChannelMonitorUpdate`s for some parts are still being completed asynchronously, further channel updates (e.g. forwarding another payment) are pending and the node restarts, the channel could have become stuck (#4520). * The presence of unconfirmed transactions actually no longer causes `ElectrumSyncClient` to spuriously fail to sync (#4590). * `FilesystemStore::list_all_keys` will no longer fail if there are stale intermediate files lying around from a previous unclean shutdown (#4618). * When forwarding an HTLC while in a blinded path with proportional fees over 200%, LDK will no longer spuriously allow a forward that pays us 1 msat too little in fees (#4697). * Fixed a rare case where a channel could get stuck on reconnect when using both async `ChannelMonitorUpdate` persistence and async signing (#4684). * `Event::PaymentSent::fee_paid_msat` is no longer `None` in cases where `ChannelManager::abandon_payment` was called before the payment ultimately completes anyway (#4651). * Syncing a `ChainMonitor` using the `Confirm` trait will no longer write some full `ChannelMonitor`s to disk several times per block (#4544). * `OMDomainResolver` now correctly accounts for failed queries when rate limiting, ensuring we continue to respond to queries after failures (#4591). * Calling `ChannelManager::send_payment_with_route` without a `route_params` and with an invalid `Route` will no longer panic (#4707). * `lightning-custom-message`'s handling of `peer_connected` events now ensures that sub-handlers will see a `peer_disconnected` event if a different sub-handler refused the connection by `Err`ing `peer_connected` (#4595). * Incomplete MPP keysend payments will no longer see their HTLCs held until expiry (#4558). * `InvoiceRequestBuilder` will no longer accept a `quantity` of `0` for a BOLT 12 `Offer`, allowing any quantity up to a bound (#4667). * `lightning-custom-message` handlers that return `Ok(None)` when asked to deserialize a message in their defined range no longer cause panics (#4709). * Several spurious debug assertions were fixed (#4537, #4618). Security ======== 0.1.10 fixes a sanitization issue and several denial-of-service vulnerabilities. * `Bolt11Invoice::recover_payee_pub_key` no longer panics if called on an invoice which set an explicit public key, rather than relying on public key recovery. This method is called from `payment_parameters_from_invoice` and `payment_parameters_from_variable_amount_invoice` (#4717). * Maliciously-crafted unpayable invoices which have overflowing feerates will no longer cause an `unwrap` failure panic (#4716). * `possiblyrandom` did not properly generate random data except when it was explicitly configured to. By default this means LDK is vulnerable to various HashDoS attacks (#4719). * `OMNameResolver` will no longer panic when looking up payment instructions which include unicode characters at the start of a TXT record (#4718). * `PrintableString` did not properly sanitize unicode format characters, allowing an attacker to corrupt the rendering of logs or UI (#4593, #4605). * RGS data is now limited in how large of a graph it is able to cause a client to store in memory. Note that RGS data is still considered a DoS vector in general and you should only use semi-trusted RGS data (#4713). * Counterparty-provided strings in failure messages are no longer logged in full, reducing the ability of such a counterparty to spam our logs (#4714). * Reading a corrupted `ChannelManager` or `ProbabilisticScorer` can no longer cause us to allocate large amounts of memory (#4712). Thanks to Project Loupe for reporting most of the issues fixed in this release.
The
chanmon_consistencyfuzz target found a reconnect ordering wheresigner_pending_revoke_and_ackandmonitor_pending_revoke_and_ackcould both describe the same owedrevoke_and_ack.The channel first received a
commitment_signedwhose monitor update completed, but the signer could not provide the next point or secret, leavingsigner_pending_revoke_and_ackset. Later, receiving the peerrevoke_and_ackfreed holding-cell HTLCs and produced a held monitor update. While that monitor update was still blocked,channel_reestablishsaw the peer one state behind and recordedmonitor_pending_revoke_and_ack, plus the corresponding monitor-pendingcommitment_signed, so the messages could be replayed once monitor updating was restored.If the signer unblocked before the held monitor update was released,
signer_maybe_unblockedgenerated and sent the RAA usingsigner_pending_revoke_and_ack. The monitor-pending flag was not cleared at that point, somonitor_updating_restoredlater generated the same RAA again when the held update completed. The peer had already advanced after accepting the signer-unblocked RAA, so it rejected the duplicate secret as not corresponding to its current pubkey and force-closed.Fix this by clearing
monitor_pending_revoke_and_ackwheneverget_last_revoke_and_acksuccessfully constructs an RAA, alongsidesigner_pending_revoke_and_ack. All resend paths regenerate RAAs through this helper, so successful generation through either pending path satisfies the other pending record. If generation fails, pending signer state is still left set and monitor-pending state remains available for monitor restoration to retry.This failure was discovered in https://github.com/lightningdevkit/rust-lightning/actions/runs/26905971318/job/79370860747.