Skip to content

[type] Add native VectorType to the type system (#410 PR 1)#411

Open
JunRuiLee wants to merge 2 commits into
apache:mainfrom
JunRuiLee:feat/vector-type-pr1
Open

[type] Add native VectorType to the type system (#410 PR 1)#411
JunRuiLee wants to merge 2 commits into
apache:mainfrom
JunRuiLee:feat/vector-type-pr1

Conversation

@JunRuiLee

@JunRuiLee JunRuiLee commented Jun 25, 2026

Copy link
Copy Markdown

PR 1 of reading Paimon VECTOR columns in paimon-rust, tracked in #410. Adds the type-system foundation; reading vector data is deferred to later PRs.

Roadmap (#410): PR 1 (this) type system → PR 2 Arrow conversion + inlined reads → PR 3 dedicated .vector. files.

Changes

  • New VectorType mirroring upstream org.apache.paimon.types.VectorType: u32 length in [1, i32::MAX], element type restricted to BOOLEAN/TINYINT/SMALLINT/INT/BIGINT/FLOAT/DOUBLE; JSON serde matching the Java wire shape; Display as VECTOR<FLOAT, 128>.
  • New DataType::Vector variant threaded through all exhaustive matches: type-system semantics (is_nullable, copy_with_nullable, contains_row_type) implemented; IO/integration sites return explicit Unsupported; BTree key comparator rejects VECTOR rather than falling back to byte comparison.

After this PR a VECTOR column deserializes/validates/serializes/renders correctly; vector data IO returns Unsupported — the intended PR 1 boundary.

Testing

cargo test -p paimon --lib 895 passed; cargo clippy -p paimon --all-targets clean; cargo check passes for paimon-datafusion and paimon-c.

Comment thread crates/paimon/src/spec/types.rs Outdated

impl Display for VectorType {
fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
write!(f, "VECTOR<{}, {}>", self.element_sql_name(), self.length)?;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Display drops the element type nullability. For example, a VECTOR<FLOAT NOT NULL, 4>-like type would render as VECTOR<FLOAT, 4>, while serde preserves the element nullability. Please either render element_type with its SQL string or reject/normalize non-null vector element types.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — Display now renders the element's nullability too (e.g. VECTOR<FLOAT NOT NULL, 4>), matching elementType.asSQLString() and the serde form.

@JunRuiLee JunRuiLee force-pushed the feat/vector-type-pr1 branch from 31bcc4d to 730842b Compare June 26, 2026 03:40
Add VectorType (VECTOR<element, length>) as a standalone fixed-size dense
vector type: u32 length validated to [1, i32::MAX], element type restricted
to BOOLEAN/TINYINT/SMALLINT/INT/BIGINT/FLOAT/DOUBLE, JSON serde matching the
Java wire shape, and a VECTOR<ELEM, N> Display.
Add the DataType::Vector variant and fill every exhaustive match: type-system
semantics (is_nullable, copy_with_nullable, contains_row_type) implemented, and
IO/integration sites returning explicit Unsupported (PR 1 supports the type in
schemas; vector data IO lands in a later PR). Deserialize is hand-written to
compose with the untagged DataType enum.
@JunRuiLee JunRuiLee force-pushed the feat/vector-type-pr1 branch from 730842b to e8478dd Compare June 26, 2026 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants