Skip to content

fix(decimal): use minimal byte length for negative powers of two#3523

Open
vishnuprakaz wants to merge 1 commit into
apache:mainfrom
vishnuprakaz:fix/decimal-bytes-required-minimal-length
Open

fix(decimal): use minimal byte length for negative powers of two#3523
vishnuprakaz wants to merge 1 commit into
apache:mainfrom
vishnuprakaz:fix/decimal-bytes-required-minimal-length

Conversation

@vishnuprakaz

Copy link
Copy Markdown
Contributor

Closes #3522

Rationale for this change

bytes_required should return the minimum number of bytes for
a value, but it returns one byte too many for negatives like
-128 and -32768.

This matters because the decimal bucket transform hashes those
bytes. The extra byte changes the hash, so PyIceberg can put a
value in a different bucket than Spark/Java. For example,
Decimal("-1.28") goes to bucket 12 in PyIceberg but bucket 13
everywhere else.

The fix computes the length from (value + 1) for negative
values, which gives the true minimum and matches the Iceberg
spec.

Are these changes tested?

Yes. Added tests for the -128 / -32768 / -8388608 boundary cases
(where the old code was wrong) plus positive/negative controls.
Lint and the decimal/conversion/transform tests pass.

Are there any user-facing changes?

Yes. For decimal values like -1.28, the computed bucket changes
(e.g. 12 → 13) so it now matches other Iceberg engines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bytes_required returns one byte too many for values like -128, -32768

1 participant