Optimize buffer ops#8322
Conversation
Merging this PR will degrade performance by 12.18%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
20.7 µs | 35.5 µs | -41.8% |
| ❌ | Simulation | compare[48] |
213 µs | 300.5 µs | -29.11% |
| ❌ | Simulation | compare[50] |
227.7 µs | 319 µs | -28.63% |
| ❌ | Simulation | compare[49] |
228.2 µs | 317.1 µs | -28.04% |
| ❌ | Simulation | compare[44] |
207.5 µs | 287.6 µs | -27.87% |
| ❌ | Simulation | compare[47] |
223.5 µs | 309.1 µs | -27.68% |
| ❌ | Simulation | compare[46] |
218.5 µs | 302.1 µs | -27.67% |
| ❌ | Simulation | compare[40] |
190.7 µs | 263.4 µs | -27.61% |
| ❌ | Simulation | compare[44] |
212.1 µs | 292.2 µs | -27.4% |
| ❌ | Simulation | compare[45] |
218.9 µs | 300.8 µs | -27.23% |
| ❌ | Simulation | compare[43] |
209.2 µs | 287.5 µs | -27.21% |
| ❌ | Simulation | compare[42] |
204.5 µs | 281 µs | -27.21% |
| ❌ | Simulation | compare[40] |
195.6 µs | 268.1 µs | -27.04% |
| ❌ | Simulation | compare[43] |
214.2 µs | 292.4 µs | -26.74% |
| ❌ | Simulation | compare[41] |
204.5 µs | 279.1 µs | -26.73% |
| ❌ | Simulation | compare[42] |
209.4 µs | 284.9 µs | -26.5% |
| ❌ | Simulation | compare[41] |
209.3 µs | 283.8 µs | -26.23% |
| ❌ | Simulation | compare[31] |
157.7 µs | 213.6 µs | -26.18% |
| ❌ | Simulation | compare[39] |
199.9 µs | 270.7 µs | -26.16% |
| ❌ | Simulation | compare[38] |
195.5 µs | 264.2 µs | -26.01% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing adamg/buffer-slice-fast (11bb86c) with develop (f67b594)
Footnotes
-
10 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
## Summary Adds a basic benchmark for slicing, including an Arrow baseline. Hopefully building up to #8322, but I want a baseline first. Signed-off-by: Adam Gutglick <adam@spiraldb.com>
fd451bf to
341039a
Compare
Polar Signals Profiling ResultsLatest Run
Previous Runs (2)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.090x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (1.090x ➖, 0↑ 4↓)
No file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.112x ❌, 0↑ 6↓)
datafusion / vortex-compact (1.080x ➖, 0↑ 2↓)
datafusion / parquet (1.121x ❌, 0↑ 5↓)
duckdb / vortex-file-compressed (1.141x ❌, 0↑ 8↓)
duckdb / vortex-compact (1.083x ➖, 0↑ 4↓)
duckdb / parquet (1.102x ❌, 0↑ 3↓)
File Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.062x ➖, 0↑ 2↓)
datafusion / vortex-compact (0.932x ➖, 10↑ 0↓)
datafusion / parquet (1.041x ➖, 1↑ 4↓)
datafusion / arrow (0.960x ➖, 4↑ 3↓)
duckdb / vortex-file-compressed (1.041x ➖, 0↑ 6↓)
duckdb / vortex-compact (1.000x ➖, 1↑ 0↓)
duckdb / parquet (0.990x ➖, 0↑ 0↓)
duckdb / duckdb (1.006x ➖, 0↑ 0↓)
File Size Changes (9 files changed, +0.2% overall, 9↑ 0↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.991x ➖, 6↑ 1↓)
datafusion / vortex-compact (1.003x ➖, 0↑ 2↓)
datafusion / parquet (0.996x ➖, 3↑ 0↓)
duckdb / vortex-file-compressed (0.994x ➖, 2↑ 1↓)
duckdb / vortex-compact (0.994x ➖, 0↑ 1↓)
duckdb / parquet (0.999x ➖, 1↑ 1↓)
duckdb / duckdb (0.997x ➖, 0↑ 3↓)
File Size Changes (7 files changed, +0.0% overall, 7↑ 0↓)
Totals:
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (1.022x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.020x ➖, 0↑ 0↓)
duckdb / parquet (1.025x ➖, 0↑ 0↓)
File Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.971x ➖, 1↑ 1↓)
datafusion / vortex-compact (1.014x ➖, 2↑ 1↓)
datafusion / parquet (0.920x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.853x ➖, 2↑ 0↓)
duckdb / vortex-compact (0.968x ➖, 0↑ 1↓)
duckdb / parquet (0.972x ➖, 0↑ 0↓)
|
BENCHMARK FAILEDBenchmark |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.993x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.006x ➖, 0↑ 0↓)
datafusion / parquet (0.997x ➖, 0↑ 0↓)
datafusion / arrow (0.926x ➖, 7↑ 0↓)
duckdb / vortex-file-compressed (1.002x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.003x ➖, 0↑ 0↓)
duckdb / parquet (1.000x ➖, 0↑ 0↓)
duckdb / duckdb (0.998x ➖, 0↑ 0↓)
File Size Changes (26 files changed, -0.0% overall, 8↑ 18↓)
Totals:
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.969x ➖, 13↑ 2↓)
datafusion / parquet (0.913x ➖, 13↑ 0↓)
duckdb / vortex-file-compressed (1.011x ➖, 4↑ 0↓)
duckdb / parquet (1.002x ➖, 0↑ 0↓)
duckdb / duckdb (1.013x ➖, 0↑ 1↓)
File Size Changes (107 files changed, -0.0% overall, 56↑ 51↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.873x ➖, 3↑ 0↓)
datafusion / vortex-compact (0.867x ➖, 5↑ 0↓)
datafusion / parquet (1.267x ➖, 1↑ 13↓)
duckdb / vortex-file-compressed (0.875x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.933x ➖, 0↑ 0↓)
duckdb / parquet (0.917x ➖, 0↑ 0↓)
|
Benchmarks: Appian on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.008x ➖, 0↑ 0↓)
datafusion / parquet (1.007x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.031x ➖, 0↑ 0↓)
duckdb / parquet (1.006x ➖, 0↑ 0↓)
duckdb / duckdb (1.007x ➖, 0↑ 0↓)
File Size Changes (4 files changed, -0.0% overall, 1↑ 3↓)
Totals:
|
Benchmarks: CompressionVortex (geomean): 1.016x ➖ How to read Verdict and Engines
unknown / unknown (1.045x ➖, 2↑ 29↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.870x ➖, 4↑ 1↓)
datafusion / vortex-compact (0.867x ➖, 3↑ 0↓)
datafusion / parquet (1.044x ➖, 0↑ 4↓)
duckdb / vortex-file-compressed (0.929x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.065x ➖, 0↑ 2↓)
duckdb / parquet (0.921x ➖, 0↑ 0↓)
|
|
I made #8162 that optimizes some of these code paths |
06ac2f8 to
dde48e4
Compare
|
@robert3005 do you want to merge that first? |
|
that would be ideal, the pr I made is a revival of an older pr already |
|
I'll review it |
dde48e4 to
ae9bc12
Compare
d4f17f3 to
64fb797
Compare
| } | ||
| } else { | ||
| // Use bitvec for unaligned bit copying. | ||
| let self_slice = self |
There was a problem hiding this comment.
We have benchmarked this and bitvec is faster for anything that's bigger than 128 bits. I think we want to keep it
bf88b71 to
b8ad2b4
Compare
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
b8ad2b4 to
a7bdc00
Compare
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
e465b55 to
ad6949b
Compare
be356c3 to
11bb86c
Compare
Summary
This PR includes a few optimization for buffer-level ops:
BitBufferMut::append_bufferuses arrow's word-sized append for unaligned bitbuffers instead of bitvec which is 1 bit a time.Alignment, instead of having less specific checks in different callsites.After this PR is merged, I'll follow up and remove
bitvecas a dependency, its currently used in a couple of pretty random places and I suspect there's nothing special about them compared to our ownBitBuffer.