perf(scan): intra-file decode parallelism — sub-split large chunk spans by lukekim · Pull Request #8400 · vortex-data/vortex

lukekim · 2026-06-12T21:20:52Z

Summary

SplitBy::Layout now sub-divides any span between adjacent chunk boundaries wider than IDEAL_SPLIT_SIZE (100k rows) into evenly sized row-range splits, so a file with few large chunks (e.g. a single flat layout, or byte-targeted int columns that coalesce to ~262k rows/chunk) decodes across multiple cores instead of one.

Correctness

Subdivision only inserts points strictly between existing adjacent boundaries — it never moves or removes one — so the half-open ranges consumers derive (tuple_windows) remain a contiguous, non-overlapping, exact partition of the same rows. Spans at or below the cap pass through untouched (fast-path no-op). All boundary consumers (RepeatedScan, VortexFile::splits(), the DataFusion repartitioner) operate on arbitrary ranges; sub-chunk ranges were already exercised by SplitBy::RowCount. The arithmetic saturates at u64::MAX.

API/Observable behavior

For files whose merged (projected-column) chunk boundaries leave spans > 100k rows, VortexFile::splits() (incl. Python bindings) returns more, smaller ranges; scans emit smaller batches; DataFusion gets real repartitioning where a single-chunk file previously collapsed to one partition. Fine-grained files (e.g. ~8k-row string chunks from the default 1 MiB block target) are untouched.

Testing

Unit/property/overflow tests for subdivide_large_spans (no-op, large single chunk, mixed gaps, exact-coverage property, u64::MAX boundary).
E2E: 250k-row single flat chunk → splits all ≤ the cap, contiguous, exact endpoints; full + filtered scans match the unsplit data.
E2E (rstest): fixed-size SplitBy::RowCount scans (unaligned 33,333 and exceeds-file 300,000 cases).
E2E: ~120-byte string column via the default write strategy keeps its natural fine-grained chunk splits (bounded relative to the cap, not to writer defaults).
cargo nextest run -p vortex-layout -p vortex-file — 174 passed on this branch.

SplitBy::Layout now sub-divides any span between adjacent chunk boundaries wider than IDEAL_SPLIT_SIZE (100k rows) into evenly sized row-range splits, so files with few large chunks decode across multiple cores. Subdivision only inserts points strictly between existing adjacent boundaries: the half-open ranges consumers derive remain a contiguous, non-overlapping, exact partition of the same rows. The arithmetic saturates at u64::MAX. Tests: unit/property/overflow coverage for the subdivision helper, an end-to-end test that a 250k-row single flat chunk scans correctly across sub-divided splits with bounded split sizes, an rstest-parameterized end-to-end test for fixed-size SplitBy::RowCount scans (previously only covered at the boundary-math level), and an end-to-end test that ~120-byte string columns written with the default strategy keep their natural ~8k-row chunk splits untouched by the cap. Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>

Introduce a test-local MAX_SPLIT_ROWS mirroring the private IDEAL_SPLIT_SIZE instead of repeating 100_000, and bound the string-chunk test relative to the cap (< MAX_SPLIT_ROWS / 4) rather than pinning the current ~8k repartition default. Signed-off-by: Luke Kim <80174+lukekim@users.noreply.github.com>

codspeed-hq · 2026-06-12T21:30:21Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 3 improved benchmarks
❌ 3 regressed benchmarks
✅ 1531 untouched benchmarks
⏩ 10 skipped benchmarks¹

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`varbinview_large`	112.9 µs	131.5 µs	-14.15%
❌	Simulation	`decompress_rd[f64, (100000, 0.01)]`	845.9 µs	981.6 µs	-13.83%
❌	Simulation	`decompress_rd[f64, (100000, 0.1)]`	845.9 µs	981.6 µs	-13.82%
⚡	Simulation	`decompress_rd[f64, (100000, 0.0)]`	1,024.6 µs	845.9 µs	+21.12%
⚡	Simulation	`decompress_rd[f32, (100000, 0.0)]`	586.8 µs	499.3 µs	+17.51%
⚡	Simulation	`encode_varbin[(1000, 2)]`	176.8 µs	157.8 µs	+12.08%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing spiceai:lukim/scan-split-large-chunks-develop (e8047a4) with develop (d0013ff)}

10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

lukekim added 3 commits June 12, 2026 14:01

Merge branch 'develop' into lukim/scan-split-large-chunks-develop

e8047a4

lukekim requested a review from a team June 12, 2026 21:20

gatesn added the action/benchmark Trigger full benchmarks to run on this PR label Jun 12, 2026

lukekim mentioned this pull request Jun 12, 2026

Merge upstream Vortex 0.75.0 into spiceai-54 (DataFusion 53 → 54) spiceai/vortex#65

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(scan): intra-file decode parallelism — sub-split large chunk spans#8400

perf(scan): intra-file decode parallelism — sub-split large chunk spans#8400
lukekim wants to merge 3 commits into
vortex-data:developfrom
spiceai:lukim/scan-split-large-chunks-develop

lukekim commented Jun 12, 2026

Uh oh!

codspeed-hq Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lukekim commented Jun 12, 2026

Summary

Correctness

API/Observable behavior

Testing

Uh oh!

codspeed-hq Bot commented Jun 12, 2026

Merging this PR will not alter performance

Performance Changes

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants