Skip to content

[python] Expose DataFrame-style scan/split planning API#415

Open
JunRuiLee wants to merge 4 commits into
apache:mainfrom
JunRuiLee:feat/py-read-builder-pr1
Open

[python] Expose DataFrame-style scan/split planning API#415
JunRuiLee wants to merge 4 commits into
apache:mainfrom
JunRuiLee:feat/py-read-builder-pr1

Conversation

@JunRuiLee

Copy link
Copy Markdown

Purpose

First PR of exposing PyPaimon's DataFrame read path to Rust. Refs #413.

Exposes the Rust core's ReadBuilder → new_scan() → plan() → DataSplit flow
through bindings/python:

  • table.new_read_builder().with_projection([...]).with_limit(n).new_scan().plan()
  • Plan.splits() returns Split objects that are picklable (opaque
    serde_json bytes), so they can cross process boundaries — groundwork for
    Ray distributed reads and a later read(splits) → Arrow step.

Out of scope (later PRs): with_filter / Predicate conversion, read(splits)
data reading, pypaimon-side wiring.

Notes

  • with_projection is accepted and forwarded, but in PR1 it does not affect
    the planned splits — Rust core projection only applies to new_read() (data
    reading), not new_scan()/planning. It is staged here for the later
    read(splits) step. Tests assert only that planning succeeds.
  • Split is an opaque payload: the serde encoding is an implementation detail,
    not a stable schema; only same/compatible-version round-trip is guaranteed.

Tests

bindings/python/tests/test_read.py — 7 tests: scan planning
(projection/limit/len) + pickle round-trip.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant