Skip to content

Reject predictor with jpeg2000/lerc/jpeg compression#3372

Merged
brendancol merged 2 commits into
mainfrom
issue-3371
Jun 17, 2026
Merged

Reject predictor with jpeg2000/lerc/jpeg compression#3372
brendancol merged 2 commits into
mainfrom
issue-3371

Conversation

@brendancol

@brendancol brendancol commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Closes #3371

The GeoTIFF writer applied the TIFF horizontal / floating-point differencing predictor (PREDICTOR=2/3) to pixel bytes before handing them to the jpeg2000 and lerc codecs, then stamped PREDICTOR=2 in the IFD. GDAL only honors the PREDICTOR tag for byte-oriented entropy codecs (deflate, lzw, zstd, lz4, packbits), so the result was a file whose pixels read back corrupted in any spec-compliant reader. GDAL read lerc+predictor=2 with a maxdiff of 64537; jpeg2000+predictor=2 would not open at all.

  • normalize_predictor already received the compression tag but never used it. It now rejects predictor 2 or 3 combined with jpeg/jpeg2000/lerc with a clear ValueError. Every writer leg (eager strip + tiled, GPU) routes through this function, so the gate covers them all.
  • Byte-oriented codecs keep honouring the predictor; the gate is codec-scoped, not a blanket ban.

Backend coverage: eager (numpy) and GPU writer (_write_geotiff_gpu). dask writes route through the same eager encode path.

Test plan

  • normalize_predictor raises for predictor 2/3 + jpeg/jpeg2000/lerc; passes through for byte codecs
  • to_geotiff raises for predictor=2 + {jpeg2000, j2k, lerc} and predictor=3 + {jpeg2000, lerc}
  • to_geotiff predictor=2 + {deflate, lzw, zstd} still round-trips (regression guard)
  • GPU writer raises for predictor=2 + {jpeg2000, lerc} (CUDA-gated)
  • Full geotiff write suite green (1175 passed)

Note on jpeg

jpeg + predictor=2 previously hit the jpeg branch in _prepare_strip (_encode.py:331), which returns before the predictor branch, so that write silently succeeded with no predictor applied. It now raises, consistent with the jpeg2000/lerc rejection and with GDAL. This is a move from silently-ignore to hard-error for that one combination.

The TIFF PREDICTOR tag only applies to byte-oriented entropy codecs
(deflate/lzw/zstd/lz4/packbits). jpeg/jpeg2000/lerc do their own pixel
modelling and GDAL ignores the tag, so applying horizontal/floating-point
differencing before these codecs wrote a file whose pixels were corrupted
to every spec-compliant reader (GDAL read lerc+predictor=2 with a maxdiff
of 64537; jpeg2000+predictor=2 would not open at all).

normalize_predictor already received the compression tag but never used
it. Gate there so every writer leg (eager strip + tiled, GPU) rejects the
combination up front with a clear ValueError. Byte-oriented codecs keep
honouring the predictor.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Reject predictor with jpeg2000/lerc/jpeg compression

Blockers

None.

Suggestions

One behavior change worth a line in the PR description (not a code change): jpeg + predictor=2 used to hit the jpeg branch in _prepare_strip (_encode.py:331), which returns before the predictor branch, so that write silently succeeded and produced a file with no predictor applied. It now raises. Raising is the correct, GDAL-consistent behavior, but it moves jpeg+predictor from silently-ignore to hard-error, so it is worth noting so it isn't a surprise.

Nits

  • _PREDICTOR_INCOMPATIBLE_COMPRESSION maps each tag to a lowercase name used only in the error message. Fine as is; a set plus a separate name lookup would be more code for no gain. No change needed.

What looks good

  • Single chokepoint. All three writer call sites (_writer.py:477, _writer.py:911, _writers/gpu.py:696) pass comp_tag from _compression_tag, so the guard in normalize_predictor covers eager strip, eager tiled, and the GPU writer at once. Confirmed by grep.
  • The gate keys on the canonical integer compression tags, which are the only values those call sites ever pass, so it cannot misfire on an unrecognized compression.
  • Codec-scoped, not a blanket ban: deflate/lzw/zstd/lz4/packbits + predictor still pass and round-trip (test_to_geotiff_predictor2_byte_codec_round_trips).
  • Default writes use deflate, which is byte-oriented, so the common path is unaffected.
  • Tests cover the unit level (normalize_predictor), the eager writer end-to-end for predictor 2 and 3, the positive round-trip regression guard, and the GPU writer (CUDA-gated). The predictor=3-requires-float check is preserved.
  • The error message names the offending codec, lists the compatible codecs, and tells the caller how to proceed.

Checklist

  • Behavior matches GDAL (PREDICTOR only honored for byte-oriented codecs)
  • All writer backends covered by one guard (eager strip/tiled + GPU)
  • No NaN/dask/compute path touched (validation-only change)
  • Edge cases covered: predictor 1/False pass-through, predictor 3 float check kept
  • No premature materialization or copies
  • Benchmark not needed (validation guard)
  • README matrix not applicable (no new function or backend change)
  • Docstring updated to note the new rejection

@brendancol brendancol merged commit 9a0ae04 into main Jun 17, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GeoTIFF writer: predictor corrupts pixels with jpeg2000/lerc compression

1 participant