Skip to content

[lake/lance] Harden LanceLakeCommitter close() and commit() error paths#3571

Open
XuQianJin-Stars wants to merge 1 commit into
apache:mainfrom
XuQianJin-Stars:fix/lance-committer-resource-leak
Open

[lake/lance] Harden LanceLakeCommitter close() and commit() error paths#3571
XuQianJin-Stars wants to merge 1 commit into
apache:mainfrom
XuQianJin-Stars:fix/lance-committer-resource-leak

Conversation

@XuQianJin-Stars

Copy link
Copy Markdown
Contributor

Two failure-side improvements to LanceLakeCommitter:

  1. commit() now catches RuntimeException from LanceDatasetAdapter and rethrows it as an IOException with the original cause preserved and the dataset URI in the message. Previously, Lance JNI/runtime exceptions escaped raw, so the tiering orchestrator could not treat them uniformly with other IO failures and operators had no context about which dataset failed.

  2. close() now catches IllegalStateException that Arrow's RootAllocator raises when child buffers were leaked, and rethrows it as an IOException with an actionable message identifying the source. The allocator is still closed on happy paths and its close() call is no longer skipped by upstream failures - the exact leak text is kept as the cause so that debugging remains straightforward.

Add LanceLakeCommitterTest covering:

  • close() after construction with no work performed does not leak.
  • commit() against a non-existent dataset raises IOException with a RuntimeException cause.
  • getMissingLakeSnapshot() against a missing dataset can be called and the committer can still be closed cleanly afterwards.

Purpose

Improve the failure-side robustness of the Lance tiering committer so that Lance-specific runtime errors are surfaced uniformly as IOExceptions (matching the LakeCommitter contract), the underlying Arrow allocator is always released on close, and any leaked child buffers produce an actionable error message that clearly identifies the source. This unblocks reliable retry / abort handling by the tiering orchestrator and reduces the operational cost of diagnosing tiering failures against Lance datasets.

Linked issue: close #xxx

Brief change log

  • LanceLakeCommitter#commit: wrap RuntimeException from LanceDatasetAdapter.commitAppend as IOException, preserving the original cause and including the dataset URI in the message.
  • LanceLakeCommitter#close: guard allocator.close() with a try/catch that converts Arrow's IllegalStateException (raised on leaked child buffers) into an IOException with an actionable message, keeping the original stack as the cause.
  • Add LanceLakeCommitterTest under fluss-lake/fluss-lake-lance/src/test/java/org/apache/fluss/lake/lance/tiering/ covering the three failure-side branches described above.

Tests

  • New LanceLakeCommitterTest (3 test methods) covers:
    • testCloseReleasesAllocatorEvenIfCommitNeverCalled
    • testCommitOnMissingDatasetIsWrappedAsIOException
    • testGetMissingLakeSnapshotReturnsNullWhenDatasetAbsent
  • Full module test run: ./mvnw -pl fluss-lake/fluss-lake-lance test -Dspotless.check.skip — 22/22 pass, no regressions in existing LanceTieringTest / LanceArrowUtilsTest / ArrowDataConverterTest.
  • Checkstyle: ./mvnw -pl fluss-lake/fluss-lake-lance checkstyle:check — 0 violations.

API and Format

  • No public API changes.
  • LanceLakeCommitter#commit continues to declare throws IOException; only the exception type produced on the failure path is tightened (previously RuntimeException could escape). Callers that already handled IOException from LakeCommitter#commit now get a strictly better contract.
  • LanceLakeCommitter#close continues to declare throws Exception; leaks now surface as IOException with a clear message instead of raw IllegalStateException from Arrow.
  • No wire format, table format, or on-disk layout changes.

Documentation

  • No user-facing documentation change is required.
  • The rationale for the two failure-path branches is captured as inline Javadoc/comments in LanceLakeCommitter so that future contributors do not accidentally revert the wrapping.

Two failure-side improvements to LanceLakeCommitter:

1. commit() now catches RuntimeException from LanceDatasetAdapter and
   rethrows it as an IOException with the original cause preserved and
   the dataset URI in the message. Previously, Lance JNI/runtime
   exceptions escaped raw, so the tiering orchestrator could not treat
   them uniformly with other IO failures and operators had no context
   about which dataset failed.

2. close() now catches IllegalStateException that Arrow's RootAllocator
   raises when child buffers were leaked, and rethrows it as an
   IOException with an actionable message identifying the source. The
   allocator is still closed on happy paths and its close() call is no
   longer skipped by upstream failures - the exact leak text is kept as
   the cause so that debugging remains straightforward.

Add LanceLakeCommitterTest covering:
- close() after construction with no work performed does not leak.
- commit() against a non-existent dataset raises IOException with a
  RuntimeException cause.
- getMissingLakeSnapshot() against a missing dataset can be called and
  the committer can still be closed cleanly afterwards.
@XuQianJin-Stars XuQianJin-Stars force-pushed the fix/lance-committer-resource-leak branch from 9ad17c1 to 3147c2d Compare July 4, 2026 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant