Skip to content

fix(rest): enforce size bounds on VD resize to match create (Bug 155)#175

Draft
Andrei Kvapil (kvaps) wants to merge 2 commits into
mainfrom
blue/round4-vd-resize-bounds
Draft

fix(rest): enforce size bounds on VD resize to match create (Bug 155)#175
Andrei Kvapil (kvaps) wants to merge 2 commits into
mainfrom
blue/round4-vd-resize-bounds

Conversation

@kvaps

Copy link
Copy Markdown
Member

What

The VD resize path (PUT .../volume-definitions/{vn}, i.e. linstor vd set-size) did not enforce the [4 MiB, 16 TiB] size bounds the create path enforces via validateVDSize (Bug 155). A below-floor force-shrink (e.g. 3 MiB) or an over-ceiling grow (e.g. 16 TiB + 1 GiB) was accepted with 200 OK and stored verbatim, leaving a VolumeDefinition spec the satellite can never materialise — it hot-loops on drbdadm create-md / device resize indefinitely. This is the exact Bug 155 failure mode reached through the resize verb instead of create, and it does not self-heal.

Why it mattered

linstor vd set-size and linstor vd c should be symmetric on physical size bounds. The create path already refuses out-of-range sizes (TestBug155VDCreateRefusesAbsurdSize); the resize path silently accepted them, so a realistic operator or CSI resize could wedge a resource.

Fix

Gate size_kib in rejectVDPatchSize, reusing the create path's validateVDSize + writeVDSizeRejection so the rejection envelope (400 + FAIL_INVLD_VLM_SIZE, with cause/correction) is byte-identical across the two verbs. The bounds check runs before the shrink-vs-force gate — consistent with the existing Bug 383 non-positive floor — because force waives the shrink-direction opt-in, never the physical floor/ceiling; checking bounds first gives the operator the accurate "invalid size" error instead of an "add --force" hint on a size that would be refused even with force. A prop-only PUT (no size_kib) is untouched.

Tests (fail-on-bug, proven both directions)

  • L1 handler (pkg/rest/vd_put_size_bounds_round4_test.go): a below-floor force-shrink and an over-ceiling grow are refused with 400 + FAIL_INVLD_VLM_SIZE and leave the stored size untouched; inclusive-boundary sizes (exactly 4 MiB / 16 TiB) still land 200.
  • Integration / envtest (tests/integration/vd_resize_bounds_test.go): the same two scenarios driven through the real apiserver + store round-trip, asserting both the wire rejection and that the durable spec is unchanged.

Both fail on the pre-fix tree (PUT returns 200 and mutates the stored spec) and pass with the gate.

Heads-up for reviewers

TestVolumeDefinitionsUpdateLargeSizeKibRoundTrip previously used 1 PiB — above the 16 TiB DRBD ceiling the create path already refuses — so it was inadvertently pinning the very create/resize asymmetry this PR closes. Its genuine purpose (a multi-TiB size_kib survives the wire without int32 truncation) is preserved by switching to the largest in-bounds size (16 TiB, still ~8× the int32 clamp point).

Scope

REST-layer input-validation fix, fully proven at the L1 + integration (envtest) tiers. The L6 cli-matrix / L7 replay runs on the live DRBD stand are deferred (stand unavailable); this change does not touch the satellite/DRBD data plane.

The VD resize path (PUT .../volume-definitions/{vn}, `linstor vd
set-size`) did not enforce the [4 MiB, 16 TiB] size bounds the create
path enforces via validateVDSize (Bug 155). A below-floor force-shrink
or an over-ceiling grow was accepted (200) and stored verbatim, leaving
a spec the satellite can never materialise — it hot-loops on `drbdadm
create-md` / device resize forever, the exact Bug 155 failure mode
reached through the resize verb. It does not self-heal.

Gate size_kib in rejectVDPatchSize, reusing the create path's
validateVDSize + writeVDSizeRejection so the rejection envelope is
byte-identical across the create and resize verbs. The check runs
before the shrink-vs-force gate (like the Bug 383 non-positive floor):
force waives the shrink-direction opt-in, never the physical
floor/ceiling.

Regression tests (fail on the pre-fix tree, pass with the gate): L1
handler tests assert a below-floor force-shrink and an over-ceiling
grow are refused with 400 + FAIL_INVLD_VLM_SIZE and leave the stored
size untouched, plus an inclusive-boundary accept; an envtest pair
drives the same two scenarios through the real apiserver round-trip.

The existing LargeSizeKibRoundTrip test used 1 PiB — above the 16 TiB
ceiling the create path already refuses (TestBug155VDCreateRefusesAbsurd
Size) — so it was pinning the very create/resize asymmetry this gate
closes. Its int64-no-truncation guard now uses the largest in-bounds
size (16 TiB), still ~8x the int32 clamp point.

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cf5c418d-c37f-46f1-9407-9a1ad7965097

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch blue/round4-vd-resize-bounds

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request mirrors the volume definition size bounds validation (4 MiB to 16 TiB) from the CREATE path onto the RESIZE path (PUT). This prevents out-of-bounds sizes from being stored during a resize, which previously caused satellite hot-loops. Unit and integration tests have been added to verify this behavior. The feedback highlights that reusing the writeVDSizeRejection helper on the resize path may output a misleading correction message referencing the create command. Additionally, the integration tests should be tightened to assert an exact 400 Bad Request status code rather than just checking for non-2xx status codes to prevent false positives during server crashes.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

return false
}

writeVDSizeRejection(w, rd, vn, *patch.SizeKib, sizeErr)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The writeVDSizeRejection helper has a hardcoded correction message (Correc) that instructs the operator to re-issue linstor vd c (the create command). Since this helper is now reused in the resize/PUT path (rejectVDPatchOutOfBounds), this correction message will be misleading to operators attempting a resize. Consider refactoring writeVDSizeRejection to accept the action/verb or make the correction message more generic (e.g., "retry the operation with a valid size").

Comment on lines +124 to +127
if status >= 200 && status < 300 {
t.Fatalf("resize ACCEPTED a sub-floor size %d KiB < %d KiB min (Bug-155 class via vd set-size)",
belowFloor, vdBoundsMinSizeKib)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Asserting that the status code is simply not in the 2xx range is too loose. If the server crashes or returns a 500 Internal Server Error, the test would pass because the stored size remains unchanged. It is better to assert the exact expected status code (400 Bad Request) to verify the API contract strictly.

	if status != http.StatusBadRequest {
		t.Fatalf("status: got %d, want 400 (sub-floor size must be refused like create); body=%s", status, body)
	}

Comment on lines +150 to +153
if status >= 200 && status < 300 {
t.Fatalf("resize ACCEPTED an over-max size %d KiB > %d KiB max (Bug-155 class via vd set-size)",
aboveMax, vdBoundsMaxSizeKib)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Asserting that the status code is simply not in the 2xx range is too loose. If the server crashes or returns a 500 Internal Server Error, the test would pass because the stored size remains unchanged. It is better to assert the exact expected status code (400 Bad Request) to verify the API contract strictly.

	if status != http.StatusBadRequest {
		t.Fatalf("status: got %d, want 400 (over-ceiling size must be refused like create); body=%s", status, body)
	}

Complete the CLI-bug-fix test tiers for the VD resize size-bounds gate
(vd set-size), per the blockstor CLI-bug-fix protocol.

- L6 cli-matrix cell tests/e2e/cli-matrix/vd-resize-bounds-rejected.sh:
  over-ceiling grow via CLI rejected, below-floor force-shrink via raw
  REST PUT rejected (400 + FAIL_INVLD_VLM_SIZE), in-bounds grow lands;
  size unchanged after each rejection.
- L7 replay tests/operator-harness/replay/vd-resize-bounds-rejected.yaml:
  the CLI-driveable operator sequence (over-ceiling reject, below-floor
  reject, in-bounds grow succeeds) with size-unchanged assertions.
- L1: add an in-bounds force-shrink (1 GiB -> 8 MiB) guard so the gate
  cannot regress the legitimate scenario-4.W13 force-shrink path.

The on-live-stand run of the L6 cell and L7 replay is DEFERRED — the
dev-kvaps DRBD oracle is unavailable this session; both files carry a
run-deferred note and a stand-pending task tracks running them. The fix
is already proven end-to-end at the L1 (handler) + integration (real
apiserver via envtest) tiers, which fully cover a REST-layer input
rejection; the L6/L7 tiers validate DRBD-state convergence, which is
N/A for a refused request, so they are the required paper trail rather
than the proof.

Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant