fix(store): preserve per-volume DRBDMinor across VD-scoped REST modify (Bug 433)#177
fix(store): preserve per-volume DRBDMinor across VD-scoped REST modify (Bug 433)#177Andrei Kvapil (kvaps) wants to merge 3 commits into
Conversation
…Bug 433) A legal in-bounds `vd set-size` grow or `vd set-property` silently dropped RD.Spec.VolumeDefinitions[].DRBDMinor — the /dev/drbd<N> device identity. Both VD-scoped write paths (volumeDefinitions.Update and PatchVolumeDefinitionSpec) rebuild the inline entry through wireToCRDVD, whose wire shape apiv1.VolumeDefinition carries no DRBDMinor, so the round-trip zeroed it. In isolation the allocator re-picks the same minor, but once a lower minor is freed by routine RD churn the controller re-stamps a DIFFERENT minor — a permanent device-identity change on a live resized volume, violating the CRD's "a non-nil minor is NEVER overwritten … preserved across a REST modify" contract. Route both write-backs through wireToCRDVDPreserving, which carries the operator-only DRBDMinor across the rebuild. Same wire-rebuild-drops-operator-only-field class as the carry-across family (Bug 206/208/209); the RD-scoped path preserved the whole VolumeDefinitions slice wholesale, which is why only the VD-scoped element rebuild dropped it. Register ResourceDefinitionVolume in the typed-field carry-across tripwire — it was wrongly documented as having no operator-only fields, the gap that let this ship — so a future field gets a build-stop failure unless it is explicitly classified. Regression (fail-on-bug, store layer): TestBug433_* assert the minor survives resize / props / Update. They FAIL on the pre-fix tree (minor comes back nil) and PASS with the carry-across. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
End-to-end (REST handler -> store -> controller allocator) fail-on-bug regression for the per-volume DRBDMinor drop on a VD-scoped modify. Scenario: RD-A grabs minor 20000, RD-B 20001; delete RD-A to free the lower minor; issue a legal in-bounds `vd set-size` grow (and, in the sibling test, a `vd set-property`) on RD-B vol0. On the pre-fix tree the minor diverges 20001 -> 20000 (the freed lower minor); with the fix it stays 20001. Complements the store-layer unit by proving the whole path does not re-stamp a live volume`s device identity. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
… 433) Per the CLI-bug-fix protocol a CLI-verb fix is not closed until the L6 cli-matrix cell and L7 replay land in the same PR. - vd-modify-preserves-drbd-minor.sh (L6): at the operator-CLI level, create rd-a (lowest minor) + rd-b (next), delete rd-a to free the lower minor, then `vd set-size` and `vd set-property` on rd-b — assert its DRBDMinor AND /dev/drbd<N> device path stay put over a settle window. Captures the minors from the live RD spec, so it needs no hard-coded values and skips cleanly on a dirty minor range. - vd-modify-preserves-drbd-minor.yaml (L7): the same operator sequence as a replay workflow, asserting minor stability via a new await kind. - replay harness: add the `drbd_minor` await kind (reads RD.Spec.VolumeDefinitions[vol].drbdMinor; pair with hold_s for stability) + document it in the runner header. On-stand run is DEFERRED: the DRBD stand is unavailable this session, so these are authored but not executed here. The store-layer unit and the envtest integration test are the deterministic fail-on-bug proof. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request resolves Bug 433 by ensuring that the operator-only DRBDMinor field (the DRBD device identity) is preserved during volume-definition-scoped updates and patches. The fix introduces a wireToCRDVDPreserving helper function to carry the minor across the wire rebuild, and adds extensive unit, integration, and end-to-end replay tests to prevent regression. The review feedback suggests adding a defensive nil check for the prev pointer in the new helper function to prevent potential nil pointer dereferences.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| func wireToCRDVDPreserving(prev *crdv1alpha1.ResourceDefinitionVolume, wire *apiv1.VolumeDefinition) crdv1alpha1.ResourceDefinitionVolume { | ||
| out := wireToCRDVD(wire) | ||
| out.DRBDMinor = prev.DRBDMinor | ||
|
|
||
| return out | ||
| } |
There was a problem hiding this comment.
To enforce defensive programming and prevent potential nil pointer dereferences in the future, it is safer to add a nil check for the prev pointer before accessing its fields. While the current callers pass a non-nil pointer, safeguarding this helper function ensures robustness against future refactoring or reuse.
| func wireToCRDVDPreserving(prev *crdv1alpha1.ResourceDefinitionVolume, wire *apiv1.VolumeDefinition) crdv1alpha1.ResourceDefinitionVolume { | |
| out := wireToCRDVD(wire) | |
| out.DRBDMinor = prev.DRBDMinor | |
| return out | |
| } | |
| func wireToCRDVDPreserving(prev *crdv1alpha1.ResourceDefinitionVolume, wire *apiv1.VolumeDefinition) crdv1alpha1.ResourceDefinitionVolume { | |
| out := wireToCRDVD(wire) | |
| if prev != nil { | |
| out.DRBDMinor = prev.DRBDMinor | |
| } | |
| return out | |
| } |
What
A legal, in-bounds VD-scoped modify —
vd set-size(grow) orvd set-property— silently dropped the per-volumeDRBDMinor(the/dev/drbd<N>device identity) fromRD.Spec.VolumeDefinitions[].Both VD-scoped store write paths (
volumeDefinitions.UpdateandPatchVolumeDefinitionSpec) rebuild the inline entry throughwireToCRDVD, whose wire shapeapiv1.VolumeDefinitioncarries noDRBDMinor— so the round-trip zeroed it. In isolation the controller re-allocates the same minor (self-heals), but once a lower minor is freed by routine RD churn, the allocate-if-nil pass hands the resized volume that freed lower minor instead. The result is a permanent device-identity change on a live volume, driven purely by a resize / property edit — violating the CRD's own contract that a non-nil minor "is NEVER overwritten … preserved across a REST modify".This is the same wire-rebuild-drops-operator-only-field class as the carry-across family (Bug 206 / 208 / 209). The RD-scoped path preserves the whole
VolumeDefinitionsslice wholesale (so the minor rode along for free); only the VD-scoped element rebuild dropped it.Fix
Route both write-backs through a new
wireToCRDVDPreserving, which carries the operator-onlyDRBDMinoracross the rebuild.ResourceDefinitionVolumeis now registered in the typed-field carry-across tripwire — it was wrongly documented as having no operator-only fields, the gap that let this ship — so a future operator-only field on the VD element is a build-stop failure unless it is explicitly classified.Tests (fail-on-bug, proven both directions)
pkg/store/k8s/drbd_minor_carry_bug433_test.go): the minor survives resize / props / Update. FAILs on the pre-fix tree (minor comes back nil), PASSes with the carry-across.tests/integration/group_blue_drbd_minor_bug433_test.go): the full REST → store → controller path. Pre-fix the minor diverges20001 → 20000(the freed lower minor); with the fix it holds at20001.vd-modify-preserves-drbd-minor.{sh,yaml}+ a newdrbd_minorreplay await kind): the operator-CLI codification per the CLI-bug-fix protocol. On-stand execution is deferred — the DRBD stand is unavailable this session — so these are authored but not run here; the store unit and the envtest integration are the deterministic proof.Each regression was verified to FAIL on the pre-fix tree and PASS with the fix.