Fix zstd decompression of replays with trailing bytes#281
Conversation
Some (older) replay files contain a stray trailing byte (a newline) after the zstd frame. zstd-jni <= 1.5.2 silently ignored bytes after a finished frame, but newer libzstd treats any post-frame bytes as the start of a second frame and fails with "Unknown frame descriptor". This is a streaming decoder behavior change in upstream libzstd; the zstd-jni Java wrapper is unchanged and one-shot/CLI decoding always rejected trailing data. Read exactly the frame's advertised content size so the decompressor stops at the frame boundary and never touches the trailing bytes. Falls back to a full read when the content size is not stored in the header (e.g. streamed files, which carry no trailing junk). This unblocks updating zstd-jni past 1.5.2; bumped to 1.5.6-10. Refs: luben/zstd-jni#301 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe PR updates replay Zstandard decompression in two code paths to use frame content size when available and fall back to stream copying otherwise. It also bumps the ChangesZstd replay decompression
🎯 2 (Simple) | ⏱️ ~10 minutes
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
data/src/main/java/com/faforever/commons/replay/ReplayLoader.java (1)
109-140: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚖️ Poor tradeoffConsider extracting the shared
decompresslogic.This
decompressmethod (including the ZSTD bounded-copy block) is duplicated verbatim inReplayDataParser. The PR had to edit both copies in lockstep to keep them aligned, which is exactly the maintenance hazard duplication creates. Extracting it into a shared helper (e.g. aReplayCompressionutility) would prevent the two paths from drifting.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@data/src/main/java/com/faforever/commons/replay/ReplayLoader.java` around lines 109 - 140, The decompression logic is duplicated between ReplayLoader.decompress and ReplayDataParser, including the ZSTD bounded-copy handling, so changes must be made in two places. Extract this logic into a shared helper utility (for example, a ReplayCompression class/method) and have both callers delegate to it. Keep the existing behavior for QTCOMPRESS, ZSTD, and UNKNOWN in the shared implementation so both paths stay aligned.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@data/src/main/java/com/faforever/commons/replay/ReplayLoader.java`:
- Around line 109-140: The decompression logic is duplicated between
ReplayLoader.decompress and ReplayDataParser, including the ZSTD bounded-copy
handling, so changes must be made in two places. Extract this logic into a
shared helper utility (for example, a ReplayCompression class/method) and have
both callers delegate to it. Keep the existing behavior for QTCOMPRESS, ZSTD,
and UNKNOWN in the shared implementation so both paths stay aligned.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 57d87661-444d-499b-99e0-ddd1753c1f00
📒 Files selected for processing (3)
data/src/main/java/com/faforever/commons/replay/ReplayDataParser.javadata/src/main/java/com/faforever/commons/replay/ReplayLoader.javagradle/libs.versions.toml
Problem
Updating
zstd-jnipast1.5.2-5broke replay parsing:LoadReplayLoaderTest.parseBinary03(and our production scenario) failed withcom.github.luben.zstd.ZstdIOException: Unknown frame descriptor. Reported upstream as luben/zstd-jni#301.Root cause
The
zstd_reference.fafreplaypayload (and matching older production files) is a complete, valid zstd frame followed by one stray trailing byte (0x0a, a newline) that is not part of the zstd stream.0x0a, can't match a frame magic, and throwsUnknown frame descriptor.This is a behavior change in upstream libzstd's streaming decoder (the zstd-jni Java wrapper is byte-identical across these versions, and one-shot/CLI decoding has always rejected trailing data). The
zstdCLI of every version rejects the same file withunknown header.setContinuous(true)does not help — the frame is complete; the failure is parsing the (non-existent) next frame.Fix
In the
ZSTDbranch ofdecompress(...), read exactly the frame's advertised content size so the decompressor stops at the frame boundary and never touches the trailing byte:The fallback covers frames without a stored content size (e.g. streamed files from the Rust replay writer, which carry no trailing junk).
The same
decompressmethod is duplicated inReplayLoaderandReplayDataParser; both are fixed to avoid drift.Bumped
zstd-jnito1.5.6-10.Testing
All 51
:datamodule tests pass, includingparseBinary03(zstd reference replay) which previously failed on every version after 1.5.2.🤖 Generated with Claude Code
Summary by CodeRabbit