Reject predictor with jpeg2000/lerc/jpeg compression#3372
Merged
Conversation
The TIFF PREDICTOR tag only applies to byte-oriented entropy codecs (deflate/lzw/zstd/lz4/packbits). jpeg/jpeg2000/lerc do their own pixel modelling and GDAL ignores the tag, so applying horizontal/floating-point differencing before these codecs wrote a file whose pixels were corrupted to every spec-compliant reader (GDAL read lerc+predictor=2 with a maxdiff of 64537; jpeg2000+predictor=2 would not open at all). normalize_predictor already received the compression tag but never used it. Gate there so every writer leg (eager strip + tiled, GPU) rejects the combination up front with a clear ValueError. Byte-oriented codecs keep honouring the predictor.
brendancol
commented
Jun 17, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: Reject predictor with jpeg2000/lerc/jpeg compression
Blockers
None.
Suggestions
One behavior change worth a line in the PR description (not a code change): jpeg + predictor=2 used to hit the jpeg branch in _prepare_strip (_encode.py:331), which returns before the predictor branch, so that write silently succeeded and produced a file with no predictor applied. It now raises. Raising is the correct, GDAL-consistent behavior, but it moves jpeg+predictor from silently-ignore to hard-error, so it is worth noting so it isn't a surprise.
Nits
_PREDICTOR_INCOMPATIBLE_COMPRESSIONmaps each tag to a lowercase name used only in the error message. Fine as is; a set plus a separate name lookup would be more code for no gain. No change needed.
What looks good
- Single chokepoint. All three writer call sites (
_writer.py:477,_writer.py:911,_writers/gpu.py:696) passcomp_tagfrom_compression_tag, so the guard innormalize_predictorcovers eager strip, eager tiled, and the GPU writer at once. Confirmed by grep. - The gate keys on the canonical integer compression tags, which are the only values those call sites ever pass, so it cannot misfire on an unrecognized compression.
- Codec-scoped, not a blanket ban: deflate/lzw/zstd/lz4/packbits + predictor still pass and round-trip (
test_to_geotiff_predictor2_byte_codec_round_trips). - Default writes use deflate, which is byte-oriented, so the common path is unaffected.
- Tests cover the unit level (
normalize_predictor), the eager writer end-to-end for predictor 2 and 3, the positive round-trip regression guard, and the GPU writer (CUDA-gated). The predictor=3-requires-float check is preserved. - The error message names the offending codec, lists the compatible codecs, and tells the caller how to proceed.
Checklist
- Behavior matches GDAL (PREDICTOR only honored for byte-oriented codecs)
- All writer backends covered by one guard (eager strip/tiled + GPU)
- No NaN/dask/compute path touched (validation-only change)
- Edge cases covered: predictor 1/False pass-through, predictor 3 float check kept
- No premature materialization or copies
- Benchmark not needed (validation guard)
- README matrix not applicable (no new function or backend change)
- Docstring updated to note the new rejection
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3371
The GeoTIFF writer applied the TIFF horizontal / floating-point differencing predictor (PREDICTOR=2/3) to pixel bytes before handing them to the jpeg2000 and lerc codecs, then stamped PREDICTOR=2 in the IFD. GDAL only honors the PREDICTOR tag for byte-oriented entropy codecs (deflate, lzw, zstd, lz4, packbits), so the result was a file whose pixels read back corrupted in any spec-compliant reader. GDAL read lerc+predictor=2 with a maxdiff of 64537; jpeg2000+predictor=2 would not open at all.
normalize_predictoralready received thecompressiontag but never used it. It now rejects predictor 2 or 3 combined with jpeg/jpeg2000/lerc with a clearValueError. Every writer leg (eager strip + tiled, GPU) routes through this function, so the gate covers them all.Backend coverage: eager (numpy) and GPU writer (
_write_geotiff_gpu). dask writes route through the same eager encode path.Test plan
normalize_predictorraises for predictor 2/3 + jpeg/jpeg2000/lerc; passes through for byte codecsto_geotiffraises for predictor=2 + {jpeg2000, j2k, lerc} and predictor=3 + {jpeg2000, lerc}to_geotiffpredictor=2 + {deflate, lzw, zstd} still round-trips (regression guard)Note on jpeg
jpeg + predictor=2previously hit the jpeg branch in_prepare_strip(_encode.py:331), which returns before the predictor branch, so that write silently succeeded with no predictor applied. It now raises, consistent with the jpeg2000/lerc rejection and with GDAL. This is a move from silently-ignore to hard-error for that one combination.