Expose GeoTIFF loader as an xarray backend engine (#3365)#3375
Open
brendancol wants to merge 6 commits into
Open
Expose GeoTIFF loader as an xarray backend engine (#3365)#3375brendancol wants to merge 6 commits into
brendancol wants to merge 6 commits into
Conversation
brendancol
commented
Jun 17, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: Expose GeoTIFF loader as an xarray backend engine
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
open_mfdatasetdoesn't combine cleanly under the default per-file naming. Each file's variable takes its stem (open_geotiff'sdefault_name=Nonederives the name from the source), soxr.open_mfdataset(["a.tif","b.tif"], engine="xrspatial_geotiff", combine="nested", concat_dim="tile")returns a Dataset with two variables (a,b), each NaN-filled on the other's slice, rather than one concatenated variable. A shared name fixes it:backend_kwargs={"default_name": "band_data"}. README.md:204 and docs/source/reference/geotiff.rst both showxr.open_mfdataset('*.tif', engine='xrspatial_geotiff')with no shared name, which will surprise users. Document the shared-name requirement next to the open_mfdataset example, and strengthentest_open_mfdataset(test_xarray_backend_3365.py:102) to assert the single-variable case viadefault_nameso the realistic path is covered.
Nits (optional improvements)
guess_can_openin _xarray_backend.py claims.tif/.tiff. If rioxarray's rasterio backend is also installed and also guesses these, a barexr.open_dataset("x.tif")with noengine=raises xarray's "found the following matching engines" error. That's expected xarray behavior and the issue did ask for autodetection, but a one-line doc note that bare auto-detection can be ambiguous when another raster backend is installed would save a confused bug report.
What looks good
open_dataset_parametersis declared explicitly. That sidesteps xarray'sdetect_parameterserror on the**kwargsforwarding and stops xarray from injecting CF decoders, in particularmask_and_scale, which would collide with open_geotiff's deprecated alias. Good catch.- Lazy import of
open_geotiffinsideopen_datasetkeeps loading the backend module cheap. guess_can_openhandles PathLike, case-insensitivity, and URL query strings.- Tests are thorough; the entry-point registration test skips on a stale editable install and asserts in CI.
Checklist
- Algorithm matches reference/paper: n/a (integration wrapper)
- All implemented backends consistent: n/a (forwards to open_geotiff)
- NaN handling correct: n/a
- Edge cases covered by tests: yes (file-like fallback, drop_variables, non-string guess input)
- Dask chunk boundaries: n/a (top-level chunks= path tested)
- No premature materialization: yes
- Benchmark exists or not needed: not needed
- README feature matrix updated: yes (example block)
- Docstrings present and accurate: yes
…utodetect ambiguity (#3365)
brendancol
commented
Jun 17, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
Follow-up review (after cf34c6c)
Both findings from the first pass are addressed:
- open_mfdataset naming (Suggestion): fixed. README.md and docs/source/reference/geotiff.rst now pass
backend_kwargs={"default_name": "band_data"}in theopen_mfdatasetexample and explain that without a shared name each file's variable takes its source stem (one variable per file instead of one combined variable).test_open_mfdatasetnow uses a shareddefault_nameand assertslist(ds.data_vars) == ["band_data"], so the realistic single-variable path is covered. - Autodetection ambiguity (Nit): fixed. docs note that bare auto-detection raises when another raster backend (e.g. rioxarray's
rasterio) also claims.tif/.tiff, and that an explicitengine=resolves it.
Tests: 17 passed, 1 skipped (entry-point registration test skips on the local editable install; runs in CI). No blockers remain.
Contributor
Author
|
Renamed the backend engine from |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #3365.
open_geotiffas an xarray backend under the engine namexrspatial, so GeoTIFF/COG/VRT sources open throughxr.open_dataset(..., engine="xrspatial")andxr.open_mfdataset(...).open_geotiffreturns aDataArray; the entry point promotes it to a one-variableDataset. The variable name is the source stem, orband_datafor an unnamed file-like source.guess_can_openclaims.tif,.tiff, and.vrt(including COG URLs that carry a query string), so the engine auto-detects those extensions without an explicitengine=.Design notes
xrspatial, as the issue suggested. A baregeotiffrisks clashing with other plugins.gpu,masked,band,overview_level,window,bbox, ...) forward throughbackend_kwargs.chunksis the one exception: xarray reserves it as a top-levelopen_datasetargument, so it has to be passed directly rather than throughbackend_kwargs. The docstring and reference docs call this out.open_dataset_parametersis declared explicitly. That keeps xarray's signature introspection happy with the**kwargsforwarding, and it stops xarray from injecting its CF decoders, in particularmask_and_scale, which would otherwise collide withopen_geotiff's deprecated alias of the same name.open_geotifflazily insideopen_dataset, so loading the backend itself stays cheap.Backend coverage
The engine forwards to
open_geotiff, which already dispatches across numpy, cupy, dask+numpy, and dask+cupy from its own parameters (gpu=,chunks=). No backend-specific code is added here.Test plan
open_datasetreturns a one-variableDataset; data, dims, coords, and georeferencing attrs matchopen_geotiff.band_data.backend_kwargsreachopen_geotiff; top-levelchunks=yields a dask-backed variable.drop_variablesis honored;open_mfdatasetover two files concatenates.guess_can_openmatches the extensions and rejects non-matching and non-string inputs.