Skip to content

Fix zstd decompression of replays with trailing bytes#281

Merged
Brutus5000 merged 1 commit into
developfrom
fix/zstd-trailing-bytes-decompression
Jun 26, 2026
Merged

Fix zstd decompression of replays with trailing bytes#281
Brutus5000 merged 1 commit into
developfrom
fix/zstd-trailing-bytes-decompression

Conversation

@Brutus5000

@Brutus5000 Brutus5000 commented Jun 26, 2026

Copy link
Copy Markdown
Member

Problem

Updating zstd-jni past 1.5.2-5 broke replay parsing: LoadReplayLoaderTest.parseBinary03 (and our production scenario) failed with com.github.luben.zstd.ZstdIOException: Unknown frame descriptor. Reported upstream as luben/zstd-jni#301.

Root cause

The zstd_reference.fafreplay payload (and matching older production files) is a complete, valid zstd frame followed by one stray trailing byte (0x0a, a newline) that is not part of the zstd stream.

  • zstd-jni ≤ 1.5.2 silently ignored bytes left over after a finished frame.
  • libzstd 1.5.4+ treats any post-frame bytes as the start of a second frame, reads the 0x0a, can't match a frame magic, and throws Unknown frame descriptor.

This is a behavior change in upstream libzstd's streaming decoder (the zstd-jni Java wrapper is byte-identical across these versions, and one-shot/CLI decoding has always rejected trailing data). The zstd CLI of every version rejects the same file with unknown header.

setContinuous(true) does not help — the frame is complete; the failure is parsing the (non-existent) next frame.

Fix

In the ZSTD branch of decompress(...), read exactly the frame's advertised content size so the decompressor stops at the frame boundary and never touches the trailing byte:

long contentSize = Zstd.getFrameContentSize(inputArray);
if (contentSize > 0) {
  IOUtils.copyLarge(compressorInputStream, out, 0, contentSize);
} else {
  IOUtils.copy(compressorInputStream, out);
}

The fallback covers frames without a stored content size (e.g. streamed files from the Rust replay writer, which carry no trailing junk).

The same decompress method is duplicated in ReplayLoader and ReplayDataParser; both are fixed to avoid drift.

Bumped zstd-jni to 1.5.6-10.

Testing

All 51 :data module tests pass, including parseBinary03 (zstd reference replay) which previously failed on every version after 1.5.2.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Improved handling of replay files compressed with Zstandard, including files that contain extra trailing bytes.
    • Decompression now uses the expected frame size when available, helping avoid incomplete or over-read replay data.
    • Updated the bundled Zstandard library to a newer version for better compatibility and reliability.

Some (older) replay files contain a stray trailing byte (a newline) after
the zstd frame. zstd-jni <= 1.5.2 silently ignored bytes after a finished
frame, but newer libzstd treats any post-frame bytes as the start of a
second frame and fails with "Unknown frame descriptor". This is a streaming
decoder behavior change in upstream libzstd; the zstd-jni Java wrapper is
unchanged and one-shot/CLI decoding always rejected trailing data.

Read exactly the frame's advertised content size so the decompressor stops
at the frame boundary and never touches the trailing bytes. Falls back to a
full read when the content size is not stored in the header (e.g. streamed
files, which carry no trailing junk).

This unblocks updating zstd-jni past 1.5.2; bumped to 1.5.6-10.

Refs: luben/zstd-jni#301

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The PR updates replay Zstandard decompression in two code paths to use frame content size when available and fall back to stream copying otherwise. It also bumps the zstd-jni dependency version.

Changes

Zstd replay decompression

Layer / File(s) Summary
Bounded Zstd copy
data/src/main/java/com/faforever/commons/replay/ReplayDataParser.java, data/src/main/java/com/faforever/commons/replay/ReplayLoader.java, gradle/libs.versions.toml
Both replay decompression paths now check Zstd.getFrameContentSize(...) and copy a bounded number of bytes when the frame size is known, otherwise they copy until EOF; the zstd-jni version is updated.

🎯 2 (Simple) | ⏱️ ~10 minutes

🐇 I hopped through bytes with careful flair,
Measured each frame with tiny rabbit care.
When zstd trails left crumbs in the stream,
I copied just right—what a tidy dream!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: fixing Zstd replay decompression when trailing bytes follow the frame.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/zstd-trailing-bytes-decompression

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
data/src/main/java/com/faforever/commons/replay/ReplayLoader.java (1)

109-140: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚖️ Poor tradeoff

Consider extracting the shared decompress logic.

This decompress method (including the ZSTD bounded-copy block) is duplicated verbatim in ReplayDataParser. The PR had to edit both copies in lockstep to keep them aligned, which is exactly the maintenance hazard duplication creates. Extracting it into a shared helper (e.g. a ReplayCompression utility) would prevent the two paths from drifting.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@data/src/main/java/com/faforever/commons/replay/ReplayLoader.java` around
lines 109 - 140, The decompression logic is duplicated between
ReplayLoader.decompress and ReplayDataParser, including the ZSTD bounded-copy
handling, so changes must be made in two places. Extract this logic into a
shared helper utility (for example, a ReplayCompression class/method) and have
both callers delegate to it. Keep the existing behavior for QTCOMPRESS, ZSTD,
and UNKNOWN in the shared implementation so both paths stay aligned.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@data/src/main/java/com/faforever/commons/replay/ReplayLoader.java`:
- Around line 109-140: The decompression logic is duplicated between
ReplayLoader.decompress and ReplayDataParser, including the ZSTD bounded-copy
handling, so changes must be made in two places. Extract this logic into a
shared helper utility (for example, a ReplayCompression class/method) and have
both callers delegate to it. Keep the existing behavior for QTCOMPRESS, ZSTD,
and UNKNOWN in the shared implementation so both paths stay aligned.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 57d87661-444d-499b-99e0-ddd1753c1f00

📥 Commits

Reviewing files that changed from the base of the PR and between 2e8a5a8 and 3a662ca.

📒 Files selected for processing (3)
  • data/src/main/java/com/faforever/commons/replay/ReplayDataParser.java
  • data/src/main/java/com/faforever/commons/replay/ReplayLoader.java
  • gradle/libs.versions.toml

@Brutus5000 Brutus5000 merged commit 70f1098 into develop Jun 26, 2026
2 checks passed
@Brutus5000 Brutus5000 deleted the fix/zstd-trailing-bytes-decompression branch June 26, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant