feat(xenium): import onboard secondary analysis (clustering/PCA/UMAP/diffexp) into the table by Tomatokeftes · Pull Request #405 · scverse/spatialdata-io

Tomatokeftes · 2026-06-18T08:43:08Z

Motivation

xenium() currently imports only the raw outputs (boundaries, transcripts, images, cell-feature matrix). The Xenium Onboard Analysis secondary analysis under analysis/ — graph-based + k-means clustering, PCA, UMAP, and differential expression — is dropped, so the 10x-computed clusters/embeddings have to be recomputed downstream even though they ship with every standard run. I couldn't find an existing issue/PR covering this (the nearby #385 is about Xenium Explorer selection GeoJSON, a different artifact).

What this does

Adds a cells_analysis: bool = True option to xenium() that, when the analysis/ folder is present, enriches the cell table:

Source	Target
`analysis/clustering/<name>/clusters.csv`	one categorical column per clustering in `table.obs` (named `<name>`, e.g. `gene_expression_graphclust`)
`analysis/pca/<name>/projection.csv`	`table.obsm["X_pca"]`
`analysis/umap/<name>/projection.csv`	`table.obsm["X_umap"]`
`analysis/diffexp/<name>/differential_expression.csv`	`table.uns["diffexp"][<name>]`

Design notes:

Everything is joined to the cells by cell_id (the CSV Barcode column == table.obs_names), so it stays aligned with the shapes/table index regardless of row order.
Cells absent from a given result (e.g. filtered out by QC before clustering) get a missing value (NaN category / NaN obsm row), never dropped — so n_obs is unchanged.
Cluster ids are stored as string categories ("1", "2", …), idiomatic for scanpy/squidpy plotting.
A missing analysis/ folder (re-segmented data, matrix-only exports) is a no-op.
Opt-out via cells_analysis=False; requires cells_table=True.

Tests

Self-contained unit tests for the parser (_add_cells_analysis): join-by-barcode with scrambled/partial rows, missing-cell handling, obsm alignment with NaN rows, and the no-op-when-absent case. No network/example data required. Verified end-to-end on a real Xenium 2.x run (24,005 cells): all 10 clusterings land in obs, X_pca/X_umap in obsm, with the QC-filtered cells correctly left as NaN.

Notes

New constants added to XeniumKeys; ruff lint + format clean.
Happy to adjust the default (True vs False), the obs column naming, or whether diffexp belongs in uns — flagging those as the main review-judgment calls.

Add a `cells_analysis` option (default True) to `xenium()` that reads the Xenium output's `analysis/` folder into the cell table when present: - `analysis/clustering/<name>/clusters.csv` -> one categorical column per clustering in `table.obs` (e.g. `gene_expression_graphclust`, `gene_expression_kmeans_10_clusters`), joined to the cells by `cell_id` (the CSV `Barcode`). Cells absent from a clustering (filtered by QC) get a missing value rather than being dropped. - `analysis/pca/<name>/projection.csv` -> `table.obsm["X_pca"]`. - `analysis/umap/<name>/projection.csv` -> `table.obsm["X_umap"]`. - `analysis/diffexp/<name>/differential_expression.csv` -> `table.uns["diffexp"][<name>]`. Until now `xenium()` imported only the raw outputs (boundaries, transcripts, images, cell-feature matrix); the onboard secondary analysis was dropped, so the 10x-computed clusters/embeddings had to be recomputed downstream. Joining by `cell_id` keeps everything aligned to the shapes/table index. A missing `analysis/` folder is a no-op (e.g. re-segmented data, matrix-only exports). Adds self-contained unit tests for the parser (join-by-barcode, missing-cell handling, obsm alignment, no-op when the folder is absent).

…ndex The analysis CSVs key on the cell_id barcode; join clustering + projections on the cell_id obs column instead of obs_names so the import stays correct even when the table index is positional rather than the barcode.

The CLI-completeness test (test_cli_exposes_all_reader_params) requires every xenium() parameter to have a matching click option in xenium_wrapper. Add the --cells-analysis option + param + pass-through for the new cells_analysis kwarg.

codecov-commenter · 2026-06-18T12:22:01Z

Codecov Report

❌ Patch coverage is 90.32258% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 38.02%. Comparing base (a63ca08) to head (687a283).

Files with missing lines	Patch %	Lines
src/spatialdata_io/readers/xenium.py	88.23%	6 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (a63ca08) and HEAD (687a283). Click for more details.

HEAD has 2 uploads less than BASE

Flag BASE (a63ca08) HEAD (687a283)

3 1

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #405       +/-   ##
===========================================
- Coverage   63.38%   38.02%   -25.36%     
===========================================
  Files          26       26               
  Lines        3217     3279       +62     
===========================================
- Hits         2039     1247      -792     
- Misses       1178     2032      +854

Files with missing lines	Coverage Δ
src/spatialdata_io/__main__.py	`81.90% <100.00%> (-2.65%)`	⬇️
src/spatialdata_io/_constants/_constants.py	`100.00% <100.00%> (ø)`
src/spatialdata_io/readers/xenium.py	`28.98% <88.23%> (-45.84%)`	⬇️

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Tomatokeftes · 2026-06-18T12:29:45Z

CI triage note: the failures on Python 3.12/3.13 look pre-existing and unrelated to this PR.

The test job passes fully on 3.11.
On 3.12/3.13 the dominant failure is ValueError: Key 'Abc' is not unique, or another case-variant of it exists (~44 occurrences), originating in shared test fixtures and hitting test_generic, test_macsima, test_seqfish, test_visium_hd, test_dataframe, and the Xenium example-data tests collaterally — i.e. readers this PR doesn't touch. It looks like dependency drift (zarr / case-sensitivity validation) since the last green main run.
This PR's diff is confined to xenium.py, __main__.py, and test_xenium.py. The new cells_analysis unit tests pass, and the only failure actually caused by this change (test_cli_exposes_all_reader_params — a missing CLI option for the new cells_analysis param) is now fixed by exposing --cells-analysis.

Happy to rebase once the fixture/dependency issue is addressed on main.

Tomatokeftes added 3 commits June 18, 2026 10:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(xenium): import onboard secondary analysis (clustering/PCA/UMAP/diffexp) into the table#405

feat(xenium): import onboard secondary analysis (clustering/PCA/UMAP/diffexp) into the table#405
Tomatokeftes wants to merge 3 commits into
scverse:mainfrom
Tomatokeftes:feat/xenium-cells-analysis

Tomatokeftes commented Jun 18, 2026

Uh oh!

codecov-commenter commented Jun 18, 2026

Uh oh!

Tomatokeftes commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Tomatokeftes commented Jun 18, 2026

Motivation

What this does

Tests

Notes

Uh oh!

codecov-commenter commented Jun 18, 2026

Codecov Report

Uh oh!

Tomatokeftes commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants