perf(shapes): faster datashader circle rendering + matplotlib-fidelity fixes#729
Merged
Conversation
2fbec2c to
ce587c6
Compare
Circles (Point+radius) were buffered to polygons at shapely's default resolution=16 (65 vertices/circle) before datashader rasterization. For large circle sets this coordinate explosion dominates the render (buffer + per-vertex transform + polygon aggregation), e.g. ~5.9M coords for 91k circles. Choose the buffer resolution from the largest disc's on-screen pixel radius (_circle_buffer_quad_segs / _circle_quad_segs): 4 segments/quadrant for small discs (<=8px, where extra vertices are sub-pixel), 8 (<=32px), and shapely's full 16 once discs are large enough to show facets. Faithful (IoU >=0.98 vs the 65-vertex circle) and handles per-circle varying radii. End-to-end on Visium HD (single coordinate system): 91k circles 2.0s->1.5s, 352k circles 8.3s->4.9s. Note: shifts the datashader-circle visual baselines (17- vs 65-vertex circles); regenerate those from CI artifacts.
ce587c6 to
b6927ac
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #729 +/- ##
==========================================
+ Coverage 79.21% 79.38% +0.17%
==========================================
Files 17 17
Lines 4566 4604 +38
Branches 1026 1031 +5
==========================================
+ Hits 3617 3655 +38
Misses 599 599
Partials 350 350
🚀 New features to boost your workflow:
|
b6927ac to
bb22094
Compare
On the datashader backend, buffering every circle (Point+radius) to a polygon dominates the render for large sets. A large (>50k), uniform-radius, outline-free circle element is a dot-field where a filled disc and a spread point are visually equivalent, so rasterize centroids as radius-faithful points (_circles_render_as_points gate + radius-aware spread in _datashader_points) instead of buffering. This is datashader-backend behavior; use method="matplotlib" for a pixel-exact rendering. Per-circle varying radii, outlines, and custom shapes keep the polygon path. `as_points` stays a simple bool (style: dots vs geometry) and is itself a speedup on both backends; it is orthogonal to this datashader optimization. Adds a 2x2 visual test (geometry/as_points x matplotlib/datashader) verifying the four render paths look alike, incl. the datashader fast-path matching exact matplotlib discs. End-to-end on Visium HD (single coordinate system): 91k circles 6.85s->0.56s vs v0.4.0, 352k 22.95s->1.14s, 5.5M 002um impractical->~12.7s; method="matplotlib" stays exact.
bb22094 to
68cc388
Compare
Adds 2x2 visual tests mirroring the circle one, for the other elements whose render paths split by backend / as_points, so divergence or breakage across paths is visible: - points: (no color / continuous) x (matplotlib / datashader) - labels: (fill / as_points) x (matplotlib / datashader) - polygons: (geometry / as_points) x (matplotlib / datashader) Colorbars disabled and marker sizes bumped so panels stay non-degenerate; each panel is titled with its (mode x backend) combination. New baselines to be generated from CI.
Generated from the py3.11-stable CI artifact. Only the 4 new 2x2 permutation tests needed baselines; the existing datashader-circle baselines stayed within tolerance under the adaptive-buffer change.
PlotTester.compare() force-shrinks every figure to a 400x300 / 5x3.75in thumbnail. matplotlib scatter markers are point-sized (absolute) and don't shrink with the squished axes, while datashader as_points (a data-coordinate raster) and the geometry do — so the as_points/matplotlib panel rendered ~1.6x oversized vs the other three. Rendering the 2x2 grids at the harness canvas size/dpi makes compare()'s resize a no-op, so the point-sized scatter and the data-coordinate paths stay consistent. (Not a library bug: at a native render size mpl and datashader as_points agree.)
…ender Re-rendered at the harness canvas size: the as_points grids (circle/polygon/labels) now show matplotlib and datashader at matching sizes. The points grid retains the known render_points matplotlib-vs-datashader marker-size difference (looser sqrt(s)*dpi/100 spread calibration).
…plotlib render_points sized its datashader canvas to the full figure (fig.get_size_inches()*dpi), so when the axes was a subplot the raster was built at figure resolution then imshow'd into the smaller axes and the markers shrank — matplotlib-vs-datashader point sizes diverged by the axes/figure ratio (e.g. ~1.8x in a 2x2 grid; they agreed only when the axes filled the figure). It also used a looser sqrt(size)*dpi/100 spread vs matplotlib's exact sqrt(size)* dpi/144 marker radius. Size the points canvas to the axes box (ax.get_window_extent(), as the as_markers/as_points path already did) and use the /144 marker-radius formula for both paths. The datashader dot now matches the matplotlib scatter marker by construction, in any layout. Degenerate-extent handling (single/coincident points) is preserved. Permutation-grid sizes set to non-overlapping values (circle 25, points 30) now that the mpl/datashader match is structural rather than tuned.
…c marker size The render_points datashader marker-size fix (axes-box canvas + /144 spread) shifts all render_points datashader baselines (point sizes now match matplotlib in any layout). Also refreshes the points and circle permutation grids. Generated from py3.11-stable CI.
…ound Two regressions from the earlier marker-size work, fixed: 1. render_points: sizing the datashader canvas to the axes box lowered its resolution, which changed point AGGREGATION (counts/reductions/density) — std/var grew spurious nonzero pixels, dots went blocky, colors shifted. Restore the figure-resolution canvas (aggregation identical to before) and instead scale only the marker spread by canvas/axes so dot size still matches matplotlib in any layout. Colors/aggregation unchanged; size deterministic. 2. circles: the adaptive quad_segs coarsened *visible* discs (they looked octagonal vs the matplotlib circles). Only coarsen sub-pixel discs (≤2px, where it's invisible and where the Visium HD speedup lives — HD spots are ~0.3-0.6px); any visible disc keeps the round default. HD spots stay sub-pixel → quad_segs=4 → speedup preserved (91k still ~0.58s/CS).
The earlier attempts to make render_points datashader markers match matplotlib in multi-panel layouts all regressed real rendering: the axes-box canvas changed point aggregation (std/var gained spurious values, dots went blocky, colors shifted); the canvas/axes spread scaling overshot when a legend shrank the axes; and the data->display transform isn't valid at render time (axis limits not yet set). render_points single-panel rendering already matches matplotlib (~0.95); the multi-panel difference is the figure-vs-axes raster scale, compounded by the test harness squishing figures to a 400x300 thumbnail. Per 'be accurate in real plotting; note and ignore harness artifacts': revert render_points to its original (correct) sizing, restore its baselines, and document the grid caveat. Keep the circle work (Phase 1/2 + conservative quad_segs so visible circles stay round).
…original sizing render_points reverted to its original sizing, so the grid baseline (previously the broken axes-box version) is regenerated. The documented multi-panel/harness size difference between the matplotlib and datashader columns is expected; single-panel rendering matches.
Datashader markers shrank in multi-panel subplots: the spread radius used sqrt(size)*dpi/100 on a figure-resolution canvas, so the on-screen size scaled with axes_window/figure and halved in a 2x2 grid. Rescale the spread by the axes-box/canvas factor ratio so the displayed radius stays at the matplotlib marker radius (sqrt(size)*dpi/144) in any layout. Unifies the render_points and as_markers paths (ratio is 1 for the axes-box canvas) and drops the 144-vs-100 split. Aggregation canvas is unchanged, so std/var/count are unaffected.
… size Datashader markers now match matplotlib in any panel layout. Three baselines shifted (multi-panel grid, multi-panel groups/na_color, and the dpi size-agree test); all single-panel datashader baselines stayed within tolerance and shapes/labels centroid baselines are unchanged (axes-box ratio is 1).
…radius Two pre-existing datashader fidelity issues exposed by the render-permutation grids: 1. render_points continuous color defaulted to reduction "sum", which inflates the normalization range where dots overlap and pushes single points to the dark end of the colormap (datashader looked nothing like matplotlib). Switch the default to "max" (each pixel shows its own value, matching matplotlib and the as_points path). The spread step also has to follow the *resolved* reduction: it defaulted to "add" for ds_reduction=None, summing overlapping dilated dots and undoing the "max" aggregate. Now the spread how uses `ds_reduction or default_reduction`. 2. as_points=True on uniform-radius circles now sizes the datashader dots to the true disc radius, so they match the geometry render (and the circle fast-path). The matplotlib backend keeps the marker `size` (scatter markers are display-sized, not data-sized) — documented as an expected backend difference.
…radius Continuous datashader points now use the "max" reduction (full colormap range instead of sum-darkened), and uniform-circle as_points dots are sized to the true radius on the datashader backend. Regenerate the six affected continuous point baselines and the shapes as_points datashader baseline; clarify the as_points test docstrings (matplotlib stays size-based, an expected backend difference).
Drop the datashader-only circle-radius override for as_points: it made the same render_shapes(as_points=True, size=...) call diverge between backends (datashader discs vs matplotlib markers). as_points is a size-controlled speedup; both backends now use the marker size, matching each other (as the polygon permutation grid already demonstrates). Restores the pre-override as_points datashader baseline. Keeps the layout-invariant marker-size fix and the faithful continuous-color reduction, which are what make the backends agree.
- _datashader_points default_reduction default "sum"->"max" to match both call sites (removes a latent footgun: a future caller omitting it would silently re-inflate continuous color). - Drop the duplicate ax.get_window_extent() in the marker-spread branch; make the factor==factor_axesbox (ratio 1) identity explicit for as_markers. - Trim the one-liner helper docstrings/comments to the load-bearing why. - Strengthen tests: gate test covers NaN radius; fast-path test spies the centroid renderer to prove the fast path actually fired (not just "an image"); soften the layout-invariance docstring to the real <1px guarantee.
- Extract _affine_major_scale() for the SVD major-axis stretch duplicated by the fast-path and _circle_buffer_quad_segs. - Fast-path: coerce only the first radius value (gate guarantees uniform+finite) instead of re-coercing the whole column — drops an O(n) pass at HD scale. - Drop a comment that restated the adjacent log line; drop a redundant bool().
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.