Skip to content

[lake/lance] Add tests for Decimal and Timestamp(LTZ) conversions#3573

Open
XuQianJin-Stars wants to merge 1 commit into
apache:mainfrom
XuQianJin-Stars:feat/lance-decimal-tsltz
Open

[lake/lance] Add tests for Decimal and Timestamp(LTZ) conversions#3573
XuQianJin-Stars wants to merge 1 commit into
apache:mainfrom
XuQianJin-Stars:feat/lance-decimal-tsltz

Conversation

@XuQianJin-Stars

Copy link
Copy Markdown
Contributor

Extend Lance test coverage for Decimal and Timestamp/TimestampLtz mappings on both the Fluss RowType -> non-shaded Arrow schema layer (LanceArrowUtils) and the shaded -> non-shaded VectorSchemaRoot copy layer (ArrowDataConverter):

  • LanceArrowUtilsTest: assert Arrow type for DECIMAL(9,2) / DECIMAL(20,4) including precision and scale round-trip; assert TimeUnit selection for TIMESTAMP_LTZ(0/3/6/9) and TIMESTAMP(0/3/6/9).
  • ArrowDataConverterTest: end-to-end conversion of DECIMAL(20,4) values through ShadedArrowBatchWriter into a non-shaded DecimalVector, and TIMESTAMP_LTZ(3) / TIMESTAMP_LTZ(6) columns into TimeStampMilliVector and TimeStampMicroVector respectively, including sub-millisecond precision truncation semantics.

These types are common in financial and event-time workloads; adding explicit coverage guards against future regressions in the Lance tiering path.

Purpose

Lance tiering already handles DECIMAL, TIMESTAMP and TIMESTAMP_LTZ in production code, but the existing unit tests did not directly exercise:

  1. The precision/scale round-trip for DECIMAL on the shaded -> non-shaded schema mapping,
  2. The TimeUnit selection rules (SECOND / MILLISECOND / MICROSECOND / NANOSECOND) for TIMESTAMP and TIMESTAMP_LTZ at every supported precision (0/3/6/9),
  3. The end-to-end value conversion from Fluss Decimal / TimestampLtz through ShadedArrowBatchWriter into the corresponding non-shaded Arrow vectors (including sub-millisecond truncation for TIMESTAMP_LTZ(3) and microsecond preservation for TIMESTAMP_LTZ(6)).

This PR closes that gap so that any future refactor of LanceArrowUtils or ArrowDataConverter will surface regressions on these two commonly-used-in-production types immediately.

Linked issue: close #xxx

Brief change log

Test-only change. No production code is touched.

  • LanceArrowUtilsTest: add three tests (+65 lines)
    • testToArrowSchemaWithDecimal — asserts ArrowType.Decimal(9,2) and ArrowType.Decimal(20,4) are produced for the corresponding Fluss types.
    • testToArrowSchemaWithTimestampLtz — asserts TIMESTAMP_LTZ(0/3/6/9) maps to Arrow Timestamp with SECOND / MILLISECOND / MICROSECOND / NANOSECOND.
    • testToArrowSchemaWithTimestampNtz — same coverage for TIMESTAMP(0/3/6/9) (no time zone).
  • ArrowDataConverterTest: add three tests (+104 lines)
    • testConvertDecimalColumn — writes three DECIMAL(20,4) values (positive / negative / very small) via ShadedArrowBatchWriter, converts to a non-shaded VectorSchemaRoot, and verifies each value using BigDecimal.compareTo on DecimalVector#getObject.
    • testConvertTimestampLtzMillisColumn — writes three TIMESTAMP_LTZ(3) values built with TimestampLtz.fromEpochMillis and asserts round-trip via TimeStampMilliVector#get.
    • testConvertTimestampLtzMicrosColumn — writes TIMESTAMP_LTZ(6) values with sub-millisecond components (Instant.ofEpochMilli(...).plusNanos(...)) and asserts the expected microsecond value in TimeStampMicroVector (i.e. millis * 1_000 + nanoOfMillis / 1_000), locking in the truncation semantics.

Tests

New tests are the deliverable of this PR. Verified locally:

  • ./mvnw -pl fluss-lake/fluss-lake-lance test -Dspotless.check.skip — all Lance tests pass, including the 6 new ones.
  • ./mvnw -pl fluss-lake/fluss-lake-lance checkstyle:check — 0 violations.

No production behavior change; existing tests are unchanged.

API and Format

None. This PR only adds tests under src/test/java. There are no changes to public Java API, table properties, wire format, or on-disk / Lance dataset format.

Documentation

None. Test-only change; no user-facing documentation impact.

Extend Lance test coverage for Decimal and Timestamp/TimestampLtz
mappings on both the Fluss RowType -> non-shaded Arrow schema layer
(LanceArrowUtils) and the shaded -> non-shaded VectorSchemaRoot copy
layer (ArrowDataConverter):

- LanceArrowUtilsTest: assert Arrow type for DECIMAL(9,2) / DECIMAL(20,4)
  including precision and scale round-trip; assert TimeUnit selection
  for TIMESTAMP_LTZ(0/3/6/9) and TIMESTAMP(0/3/6/9).
- ArrowDataConverterTest: end-to-end conversion of DECIMAL(20,4) values
  through ShadedArrowBatchWriter into a non-shaded DecimalVector, and
  TIMESTAMP_LTZ(3)/TIMESTAMP_LTZ(6) columns into TimeStampMilliVector
  and TimeStampMicroVector respectively, including sub-millisecond
  precision truncation semantics.

These types are common in financial and event-time workloads; adding
explicit coverage guards against future regressions in the Lance
tiering path.
@XuQianJin-Stars XuQianJin-Stars force-pushed the feat/lance-decimal-tsltz branch from 4949d21 to 001c893 Compare July 4, 2026 05:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant