Skip to content

fix(rest): validate layer_stack on rg-modify + rd inherit (Bug 434)#178

Draft
Andrei Kvapil (kvaps) wants to merge 3 commits into
mainfrom
blue/round7-rg-layer-validate
Draft

fix(rest): validate layer_stack on rg-modify + rd inherit (Bug 434)#178
Andrei Kvapil (kvaps) wants to merge 3 commits into
mainfrom
blue/round7-rg-layer-validate

Conversation

@kvaps

Copy link
Copy Markdown
Member

What

The RD/RG create paths validate the layer stack (validateLayerStack: allowlist {DRBD, LUKS, STORAGE}, DRBD first, STORAGE terminal), but two sibling paths bypassed it:

  1. rg modify (handleRGUpdate) gated only place_count — an invalid select_filter.layer_stack was merged and persisted unvalidated.
  2. handleRDCreate ran validateRDCreateBody (which validates an explicit layer_list) before inheritLayerStackFromRG — so an RD created against such an RG inherited the invalid stack unchecked.

Net: an invalid, unmaterialisable layer stack the direct create path refuses (400) reached a persisted RD spec via the rg modify → inherit chain. On the stand the satellite/dispatcher cannot materialise an out-of-order layer chain (drbdadm / LVM config error) → spawn failure / hot-loop. Same validate-on-create, bypass-on-modify/inherit asymmetry class as the resize-bounds bypass.

Fix

  1. Extend the rg-update wire gate (rgUpdatePlaceCountGatergUpdateWireGate) to reject an invalid layer_stack with the same 400 the create path returns, validating exactly the value the merge would persist.
  2. Defense-in-depth: re-validate the resolved (possibly inherited) stack in handleRDCreate after the RG inherit — so an already-invalid RG (from any prior path) can't smuggle a bad stack into an RD.

Either gate alone closes the hole; both together also protect against a pre-existing bad RG.

Tests (fail-on-bug, proven both directions)

  • L1 REST unit (pkg/rest/bug_434_rg_layer_stack_validation_test.go): drives the real handler mux over an in-memory store. The rg-modify rejection and the rd-inherit rejection both FAIL on the pre-fix tree (accepted / persisted) and PASS with the gates; healthy paths (valid re-order, absent field) still succeed.
  • Integration / envtest (tests/integration/group_blue_rg_layer_bug434_test.go): the full REST surface (rg create → rg modify → rd create inherit). Pre-fix the RD is persisted with the invalid [STORAGE,DRBD] stack; with the fix no persisted RD carries the invalid ordering.
  • L6 cli-matrix + L7 replay (rg-modify-invalid-layer-rejected.{sh,yaml}): the operator-CLI codification. On-stand execution is deferred — the DRBD stand is unavailable this session — so these are authored but not run here.

Each regression was verified to FAIL on the pre-fix tree and PASS with the fix.

Andrei Kvapil (kvaps) and others added 3 commits July 4, 2026 00:54
The RD/RG create paths validate the layer stack (validateLayerStack:
allowlist {DRBD,LUKS,STORAGE}, DRBD first, STORAGE terminal), but two
sibling paths bypassed it:

  1. `rg modify` (handleRGUpdate) gated only place_count, so an invalid
     select_filter.layer_stack was merged and persisted unvalidated.
  2. handleRDCreate ran validateRDCreateBody (which validates an EXPLICIT
     layer_list) BEFORE inheritLayerStackFromRG, so an RD created against
     such an RG inherited the invalid stack unchecked.

Net: an invalid, unmaterialisable layer stack the direct create path
refuses (400) reached a persisted RD spec via the rg-modify -> inherit
chain; on the stand the satellite can't materialise it (drbdadm/LVM
error) -> spawn failure / hot-loop. Same validate-on-create,
bypass-on-modify/inherit asymmetry class as the resize-bounds bypass.

Fix: (1) extend the rg-update wire gate (rgUpdatePlaceCountGate ->
rgUpdateWireGate) to reject an invalid layer_stack with the same 400 the
create path returns, and (2) defense-in-depth re-validate the resolved
(possibly inherited) stack in handleRDCreate after the RG inherit. Either
gate alone closes the hole; both together also stop an already-invalid RG
(from any prior path) from smuggling a bad stack into an RD.

Regression (fail-on-bug): TestBug434* drive the real handler mux over an
in-memory store — the rg-modify rejection and the rd-inherit rejection
both FAIL on the pre-fix tree (accepted / persisted) and PASS with the
gates. Healthy-path (valid re-order, absent field) still succeeds.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
… 434)

End-to-end regression driving the whole REST surface (rg create -> rg
modify -> rd create inherit). Fix-agnostic: passes whether the rg-modify
gate rejects the invalid stack or the rd-create post-inherit
re-validation refuses the RD. On the pre-fix tree the RD is persisted
with the invalid [STORAGE,DRBD] stack inherited from the modified RG ->
FAIL; with the fix no persisted RD carries the invalid ordering.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
…Bug 434)

Per the CLI-bug-fix protocol, author the operator-CLI codification for
the `rg modify` layer-stack gate:

- rg-modify-invalid-layer-rejected.sh (L6): rg c --layer-list drbd,storage
  → `rg modify --layer-list storage,drbd` MUST be rejected → the RG keeps
  its valid stack → an RD inheriting from it gets the valid [DRBD,STORAGE],
  never [STORAGE,DRBD] (read off the RD/RG CRD spec).
- rg-modify-invalid-layer-rejected.yaml (L7): the same operator sequence,
  codified via exit-code contract (the runner has no layer-stack await
  kind; the RD-inherit invariant is asserted at the L6 layer, mirroring
  rg-delete-with-rds-rejected).

On-stand run is DEFERRED (the DRBD stand is unavailable this session);
the rest-handler unit and the envtest integration are the deterministic
fail-on-bug proof.

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b0374757-fb31-4d02-aee0-73e1693cbf0a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch blue/round7-rg-layer-validate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses Bug 434 by introducing validation gates for layer stacks during resource group updates and resource definition creation, preventing invalid configurations from being persisted or inherited. It also adds extensive unit, integration, and end-to-end tests to verify these fixes. The review feedback correctly identifies a potential issue in the integration test where any error from the Kubernetes client would cause the test to silently pass, and suggests explicitly checking for a 'NotFound' error to avoid false positives.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +179 to +182
err := stack.Env.Client.Get(context.Background(), types.NamespacedName{Name: rdName}, &rd)
if err != nil {
return // RD not persisted (create refused) — invariant holds.
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In TestBug434RGLayerModifyNoInvalidRDInherit, any error returned by stack.Env.Client.Get (such as connection timeouts, RBAC issues, or API server errors) will cause the test to return early and silently pass. To prevent false positives, we should explicitly check if the error is a NotFound error using apierrors.IsNotFound(err) and fail the test for any other error.\n\nNote: You will need to import apierrors "k8s.io/apimachinery/pkg/api/errors" in the imports block of this file.

	err := stack.Env.Client.Get(context.Background(), types.NamespacedName{Name: rdName}, &rd)
	if err != nil {
		if apierrors.IsNotFound(err) {
			return // RD not persisted (create refused) — invariant holds.
		}
		t.Fatalf("failed to get RD: %v", err)
	}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant