feat: Migrate bucket module#71
Conversation
|
Thank you @ChaomingZhangCN for contributing the bucket function framework, and @liangjie3138 for the BucketSelectConverter — migrated as part of this batch. 🎉 |
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for migrating the bucket module. I reviewed PR #71 at head ba8560c200f768354b795290dd017225c1a671db.
Treating the remaining prerequisite dependency state as expected migration-period drift, I did not find a blocking correctness issue in the migrated bucket id calculator, default/mod/hive bucket functions, bucket select converter, Hive hasher, and related tests. I also checked the PR-local Decimal128 reconstruction change in bucket_id_calculator.cpp; it avoids shifting a signed negative high word and looks like a correct safety fix rather than a behavior regression.
Please make sure the dependent migration PRs are merged or rebased in, and run the relevant C++ build/tests from a clean checkout before final merge.
Purpose
No Linked issue.
Migrate bucket ID calculation, bucket functions, and bucket select converter:
Bucket interfaces (
include/paimon/bucket/):BucketIdCalculator— computes bucket ID for a given row (bucket_id_calculator.h)BucketFunctionType— enum for bucket function types (bucket_function_type.h)Bucket functions (
src/paimon/core/bucket/):BucketFunction— abstract bucket function interface (bucket_function.h)BucketIdCalculator— bucket ID calculation with hash-based routing (bucket_id_calculator.cpp)HiveBucketFunction— Hive-compatible bucket hash function (hive_bucket_function.h/cpp)HiveHasher— Hive ObjectInspector-compatible hashers for all data types (hive_hasher.h)ModBucketFunction— simple modulo-based bucket function (mod_bucket_function.h/cpp)DefaultBucketFunction— default bucket function using MurmurHash (default_bucket_function.h)BucketSelectConverter— converts predicates to bucket filter for scan pruning (bucket_select_converter.h/cpp)Tests
bucket_id_calculator_test.cpp— bucket ID calculation correctnesshive_bucket_function_test.cpp— Hive bucket hash Java compatibilitymod_bucket_function_test.cpp— modulo bucket functiondefault_bucket_function_test.cpp— default bucket functionbucket_select_converter_test.cpp— predicate to bucket filter conversionAPI and Format
Documentation
Generative AI tooling
Migrate-by: Aone Copilot (Claude)