Skip to content

Expose GeoTIFF loader as an xarray backend engine (#3365)#3375

Open
brendancol wants to merge 6 commits into
mainfrom
issue-3365
Open

Expose GeoTIFF loader as an xarray backend engine (#3365)#3375
brendancol wants to merge 6 commits into
mainfrom
issue-3365

Conversation

@brendancol

@brendancol brendancol commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Summary

Closes #3365.

  • Registers open_geotiff as an xarray backend under the engine name xrspatial, so GeoTIFF/COG/VRT sources open through xr.open_dataset(..., engine="xrspatial") and xr.open_mfdataset(...).
  • open_geotiff returns a DataArray; the entry point promotes it to a one-variable Dataset. The variable name is the source stem, or band_data for an unnamed file-like source.
  • guess_can_open claims .tif, .tiff, and .vrt (including COG URLs that carry a query string), so the engine auto-detects those extensions without an explicit engine=.

Design notes

  • Engine name is xrspatial, as the issue suggested. A bare geotiff risks clashing with other plugins.
  • GeoTIFF read options (gpu, masked, band, overview_level, window, bbox, ...) forward through backend_kwargs. chunks is the one exception: xarray reserves it as a top-level open_dataset argument, so it has to be passed directly rather than through backend_kwargs. The docstring and reference docs call this out.
  • open_dataset_parameters is declared explicitly. That keeps xarray's signature introspection happy with the **kwargs forwarding, and it stops xarray from injecting its CF decoders, in particular mask_and_scale, which would otherwise collide with open_geotiff's deprecated alias of the same name.
  • The backend module imports open_geotiff lazily inside open_dataset, so loading the backend itself stays cheap.

Backend coverage

The engine forwards to open_geotiff, which already dispatches across numpy, cupy, dask+numpy, and dask+cupy from its own parameters (gpu=, chunks=). No backend-specific code is added here.

Test plan

  • open_dataset returns a one-variable Dataset; data, dims, coords, and georeferencing attrs match open_geotiff.
  • file-like source falls back to band_data.
  • backend_kwargs reach open_geotiff; top-level chunks= yields a dask-backed variable.
  • drop_variables is honored; open_mfdataset over two files concatenates.
  • guess_can_open matches the extensions and rejects non-matching and non-string inputs.
  • entry-point registration resolves to the backend class (runs in CI's fresh install; skips on a stale local editable install).

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Expose GeoTIFF loader as an xarray backend engine

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • open_mfdataset doesn't combine cleanly under the default per-file naming. Each file's variable takes its stem (open_geotiff's default_name=None derives the name from the source), so xr.open_mfdataset(["a.tif","b.tif"], engine="xrspatial_geotiff", combine="nested", concat_dim="tile") returns a Dataset with two variables (a, b), each NaN-filled on the other's slice, rather than one concatenated variable. A shared name fixes it: backend_kwargs={"default_name": "band_data"}. README.md:204 and docs/source/reference/geotiff.rst both show xr.open_mfdataset('*.tif', engine='xrspatial_geotiff') with no shared name, which will surprise users. Document the shared-name requirement next to the open_mfdataset example, and strengthen test_open_mfdataset (test_xarray_backend_3365.py:102) to assert the single-variable case via default_name so the realistic path is covered.

Nits (optional improvements)

  • guess_can_open in _xarray_backend.py claims .tif/.tiff. If rioxarray's rasterio backend is also installed and also guesses these, a bare xr.open_dataset("x.tif") with no engine= raises xarray's "found the following matching engines" error. That's expected xarray behavior and the issue did ask for autodetection, but a one-line doc note that bare auto-detection can be ambiguous when another raster backend is installed would save a confused bug report.

What looks good

  • open_dataset_parameters is declared explicitly. That sidesteps xarray's detect_parameters error on the **kwargs forwarding and stops xarray from injecting CF decoders, in particular mask_and_scale, which would collide with open_geotiff's deprecated alias. Good catch.
  • Lazy import of open_geotiff inside open_dataset keeps loading the backend module cheap.
  • guess_can_open handles PathLike, case-insensitivity, and URL query strings.
  • Tests are thorough; the entry-point registration test skips on a stale editable install and asserts in CI.

Checklist

  • Algorithm matches reference/paper: n/a (integration wrapper)
  • All implemented backends consistent: n/a (forwards to open_geotiff)
  • NaN handling correct: n/a
  • Edge cases covered by tests: yes (file-like fallback, drop_variables, non-string guess input)
  • Dask chunk boundaries: n/a (top-level chunks= path tested)
  • No premature materialization: yes
  • Benchmark exists or not needed: not needed
  • README feature matrix updated: yes (example block)
  • Docstrings present and accurate: yes

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after cf34c6c)

Both findings from the first pass are addressed:

  • open_mfdataset naming (Suggestion): fixed. README.md and docs/source/reference/geotiff.rst now pass backend_kwargs={"default_name": "band_data"} in the open_mfdataset example and explain that without a shared name each file's variable takes its source stem (one variable per file instead of one combined variable). test_open_mfdataset now uses a shared default_name and asserts list(ds.data_vars) == ["band_data"], so the realistic single-variable path is covered.
  • Autodetection ambiguity (Nit): fixed. docs note that bare auto-detection raises when another raster backend (e.g. rioxarray's rasterio) also claims .tif/.tiff, and that an explicit engine= resolves it.

Tests: 17 passed, 1 skipped (entry-point registration test skips on the local editable install; runs in CI). No blockers remain.

@brendancol

Copy link
Copy Markdown
Contributor Author

Renamed the backend engine from xrspatial_geotiff to xrspatial (commit b7a47bc) per maintainer preference. guess_can_open already scopes the engine to .tif/.tiff/.vrt by extension, so the broader engine name relies on extension detection to claim the right sources. Updated setup.cfg, the module docstring, README, docs, and tests; the PR description and follow-up #3376 now use the new name too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Expose GeoTIFF loader as an xarray backend engine

1 participant