Skip to content

[arrow] Read inline VECTOR columns via Arrow FixedSizeList (#410 PR 2)#412

Draft
JunRuiLee wants to merge 4 commits into
apache:mainfrom
JunRuiLee:feat/vector-type-pr2
Draft

[arrow] Read inline VECTOR columns via Arrow FixedSizeList (#410 PR 2)#412
JunRuiLee wants to merge 4 commits into
apache:mainfrom
JunRuiLee:feat/vector-type-pr2

Conversation

@JunRuiLee

Copy link
Copy Markdown

PR 2 of reading Paimon VECTOR columns in paimon-rust, tracked in #410. Enables reading vector columns stored inline in ordinary parquet data files.

Stacked on #411 (PR 1) — depends on the VectorType / DataType::Vector introduced there. Review/merge after #411. The diff shown here includes #411's commits until it merges.

Roadmap (#410): PR 1 type system (#411) → PR 2 (this) Arrow conversion + parquet inline read → PR 3 dedicated .vector. files.

Changes

  • VectorType ↔ Arrow FixedSizeList conversion in arrow/mod.rs (both directions), replacing PR 1's Unsupported: forward VectorFixedSizeList(Field("element", elem, element_nullable), length); reverse guards non-positive sizes with u32::try_from and routes through VectorType::try_new (rejecting length 0 and invalid element types).
  • The read path builds the target Arrow schema from paimon_type_to_arrow, so an inline FixedSizeList vector column now materializes directly.

Unchanged boundaries (vectors still not filter-pushable / not cast / not partition or BTree keys / no binary_row datum); writing and dedicated .vector. files remain out of scope.

Testing

cargo test -p paimon --lib 901 passed; cargo fmt --all -- --check clean; cargo clippy -p paimon --all-targets --features fulltext,vortex,mosaic -- -D warnings clean. Adds 5 Arrow conversion unit tests + an end-to-end parquet test reading a FixedSizeList<Float32> column with a null row (downcast asserts value_length, child values, null bitmap).

@JunRuiLee JunRuiLee marked this pull request as draft June 26, 2026 03:02
@JunRuiLee JunRuiLee force-pushed the feat/vector-type-pr2 branch 2 times, most recently from e889567 to 95566c3 Compare June 26, 2026 03:40
Add VectorType (VECTOR<element, length>) as a standalone fixed-size dense
vector type: u32 length validated to [1, i32::MAX], element type restricted
to BOOLEAN/TINYINT/SMALLINT/INT/BIGINT/FLOAT/DOUBLE, JSON serde matching the
Java wire shape, and a VECTOR<ELEM, N> Display.
Add the DataType::Vector variant and fill every exhaustive match: type-system
semantics (is_nullable, copy_with_nullable, contains_row_type) implemented, and
IO/integration sites returning explicit Unsupported (PR 1 supports the type in
schemas; vector data IO lands in a later PR). Deserialize is hand-written to
compose with the untagged DataType enum.
@JunRuiLee JunRuiLee force-pushed the feat/vector-type-pr2 branch from 95566c3 to c1d8c2b Compare June 26, 2026 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant