Skip to content

CLVM enhancements and fixes#12617

Draft
Pearl1594 wants to merge 90 commits into
mainfrom
clvm-enhancements
Draft

CLVM enhancements and fixes#12617
Pearl1594 wants to merge 90 commits into
mainfrom
clvm-enhancements

Conversation

@Pearl1594

@Pearl1594 Pearl1594 commented Feb 9, 2026

Copy link
Copy Markdown
Contributor

Description

This PR enhances the existing CLVM implementation which was based on the deprecated CLVM technology which was based on corosync/pacemaker. With RHEL 7 having reached EOL, CLVM seems to be broken. CLVM supports RAW volumes on LVM , where as CLVM_NG support QCOW2 on LVM.

Further details: https://cwiki.apache.org/confluence/display/CLOUDSTACK/Modernized+CLVM%3A+Enhancements+and+CLVM_NG+support

NOTE: On testing - it was identified that incremental snapshots for clvm-ng do not work as expected. As of now it's been removed from scope. So, CLVM and CLVM_NG would only support full snapshots.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

@codecov

codecov Bot commented Feb 9, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 3.68%. Comparing base (d3e1976) to head (df61d6f).
⚠️ Report is 15 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (d3e1976) and HEAD (df61d6f). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (d3e1976) HEAD (df61d6f)
unittests 1 0
Additional details and impacted files
@@              Coverage Diff              @@
##               main   #12617       +/-   ##
=============================================
- Coverage     17.90%    3.68%   -14.23%     
=============================================
  Files          5938      454     -5484     
  Lines        532864    38798   -494066     
  Branches      65192     7151    -58041     
=============================================
- Hits          95392     1428    -93964     
+ Misses       426793    37183   -389610     
+ Partials      10679      187    -10492     
Flag Coverage Δ
uitests 3.67% <ø> (+<0.01%) ⬆️
unittests ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16801

@harikrishna-patnala harikrishna-patnala added this to the 4.23.0 milestone Feb 17, 2026
@codecov

codecov Bot commented Feb 17, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 43.19728% with 1336 lines in your changes missing coverage. Please review.
✅ Project coverage is 18.87%. Comparing base (6bc83a3) to head (f438897).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...oud/hypervisor/kvm/storage/ClvmStorageAdaptor.java 36.26% 436 Missing and 14 partials ⚠️
...ud/hypervisor/kvm/storage/KVMStorageProcessor.java 31.46% 97 Missing and 1 partial ⚠️
...torage/motion/StorageSystemDataMotionStrategy.java 13.46% 90 Missing ⚠️
...stack/engine/orchestration/VolumeOrchestrator.java 21.69% 81 Missing and 2 partials ⚠️
...ervisor/kvm/resource/LibvirtComputingResource.java 54.42% 59 Missing and 8 partials ⚠️
...n/java/com/cloud/storage/VolumeApiServiceImpl.java 28.57% 61 Missing and 4 partials ⚠️
...wrapper/LibvirtClvmLockTransferCommandWrapper.java 38.63% 48 Missing and 6 partials ⚠️
...n/java/com/cloud/vm/VirtualMachineManagerImpl.java 16.39% 51 Missing ⚠️
...tack/storage/endpoint/DefaultEndPointSelector.java 56.63% 32 Missing and 17 partials ⚠️
...n/java/com/cloud/storage/clvm/ClvmPoolManager.java 77.10% 25 Missing and 24 partials ⚠️
... and 29 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #12617      +/-   ##
============================================
+ Coverage     18.75%   18.87%   +0.12%     
- Complexity    17966    18201     +235     
============================================
  Files          6160     6170      +10     
  Lines        552578   554819    +2241     
  Branches      67348    67717     +369     
============================================
+ Hits         103660   104748    +1088     
- Misses       437512   438557    +1045     
- Partials      11406    11514     +108     
Flag Coverage Δ
uitests 3.53% <ø> (-0.01%) ⬇️
unittests 20.07% <43.19%> (+0.12%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16875

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 16877

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

Comment thread server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java Outdated
Comment thread server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java Outdated
Comment thread server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java Outdated
Comment on lines +2749 to 2785
// Check if CLVM lock transfer is needed (even if moveVolumeNeeded is false)
// This handles the case where the volume is already on the correct storage pool
// but the VM is running on a different host, requiring only a lock transfer
boolean isClvmLockTransferNeeded = !moveVolumeNeeded &&
isClvmLockTransferRequired(newVolumeOnPrimaryStorage, existingVolumeOfVm, vm);

if (isClvmLockTransferNeeded) {
// CLVM lock transfer - no data copy, no pool change needed
newVolumeOnPrimaryStorage = executeLightweightLockMigration(
newVolumeOnPrimaryStorage, vm, existingVolumeOfVm,
"CLVM lock transfer", "same pool to different host");
} else if (moveVolumeNeeded) {
PrimaryDataStoreInfo primaryStore = (PrimaryDataStoreInfo)newVolumeOnPrimaryStorage.getDataStore();
if (primaryStore.isLocal()) {
throw new CloudRuntimeException(
"Failed to attach local data volume " + volumeToAttach.getName() + " to VM " + vm.getDisplayName() + " as migration of local data volume is not allowed");
}
StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId());

try {
HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId());
newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(),
volumeToAttachHyperType);
} catch (ConcurrentOperationException | StorageUnavailableException e) {
logger.debug("move volume failed", e);
throw new CloudRuntimeException("move volume failed", e);
boolean isClvmLightweightMigration = isClvmLightweightMigrationNeeded(
newVolumeOnPrimaryStorage, existingVolumeOfVm, vm);

if (isClvmLightweightMigration) {
newVolumeOnPrimaryStorage = executeLightweightLockMigration(
newVolumeOnPrimaryStorage, vm, existingVolumeOfVm,
"CLVM lightweight migration", "different pools, same VG");
} else {
StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId());

try {
HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId());
newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(),
volumeToAttachHyperType);
} catch (ConcurrentOperationException | StorageUnavailableException e) {
logger.debug("move volume failed", e);
throw new CloudRuntimeException("move volume failed", e);
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this volume move section be extracted to a separate method to reduce indentation?

@Pearl1594 Pearl1594 Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be ok if I addressed this in another PR - with other enhancements that are yet to come.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a straightfoward refactor. I would prefer having it on this one already, but its up to you.

Comment thread server/src/main/java/com/cloud/storage/VolumeApiServiceImpl.java Outdated
@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18208

@blueorangutan

Copy link
Copy Markdown

[SF] Trillian test result (tid-16265)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 94885 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12617-t16265-kvm-ol8.zip
Smoke tests completed. 136 look OK, 15 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File
test_CRUD_operations_userdata Error 1522.99 test_register_userdata.py
test_deploy_vm_with_registered_userdata Error 5.58 test_register_userdata.py
test_deploy_vm_with_registered_userdata_with_override_policy_allow Error 5.55 test_register_userdata.py
test_deploy_vm_with_registered_userdata_with_override_policy_append Error 6.48 test_register_userdata.py
test_deploy_vm_with_registered_userdata_with_override_policy_deny Error 5.56 test_register_userdata.py
test_deploy_vm_with_registered_userdata_with_params Error 6.65 test_register_userdata.py
test_link_and_unlink_userdata_to_template Error 5.73 test_register_userdata.py
test_user_userdata_crud Error 9.72 test_register_userdata.py
ContextSuite context=TestResetVmOnReboot>:setup Error 0.00 test_reset_vm_on_reboot.py
ContextSuite context=TestRAMCPUResourceAccounting>:setup Error 0.00 test_resource_accounting.py
test_03_register_template Error 1.13 test_resource_names.py
ContextSuite context=TestRestoreVM>:setup Error 0.00 test_restore_vm.py
ContextSuite context=TestRouterDHCPHosts>:setup Error 0.00 test_router_dhcphosts.py
ContextSuite context=TestRouterDHCPOpts>:setup Error 0.00 test_router_dhcphosts.py
ContextSuite context=TestRouterDns>:setup Error 0.00 test_router_dns.py
ContextSuite context=TestRouterDnsService>:setup Error 0.00 test_router_dnsservice.py
ContextSuite context=TestRouterIpTablesPolicies>:setup Error 0.00 test_routers_iptables_default_policy.py
ContextSuite context=TestVPCIpTablesPolicies>:setup Error 0.00 test_routers_iptables_default_policy.py
ContextSuite context=TestIsolatedNetworks>:setup Error 0.00 test_routers_network_ops.py
ContextSuite context=TestRedundantIsolateNetworks>:setup Error 0.00 test_routers_network_ops.py
ContextSuite context=TestRouterServices>:setup Error 0.00 test_routers.py
ContextSuite context=TestCpuCapServiceOfferings>:setup Error 0.00 test_service_offerings.py
ContextSuite context=TestServiceOfferings>:setup Error 0.30 test_service_offerings.py
ContextSuite context=TestSetSourceNatIp>:setup Error 0.00 test_set_sourcenat.py
ContextSuite context=TestSnapshotRootDisk>:setup Error 0.00 test_snapshots.py
ContextSuite context=TestSnapshotStandaloneBackup>:setup Error 0.00 test_snapshots.py
ContextSuite context=TestSslOffloading>:setup Error 0.00 test_ssl_offloading.py

…ures, change the clvm vols to exclusive on source from shared, and on success, change dest vol to exclusive only for cross-pool migration
@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18209

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Comment on lines +2445 to +2449
Answer answer = agentManager.send(hostId,
new ClvmLockTransferCommand(operation, lvPath, volumeInfo.getUuid()));
if (answer == null || !answer.getResult()) {
String details = answer != null ? answer.getDetails() : "null answer";
logger.warn("CLVM lock command [{}] failed for LV [{}] on host [{}]: {}", operation, lvPath, hostId, details);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the consequence of improper locking? If we get a false answer here we just continue, what happens then?
If there is no consequence, why are we locking and unlocking?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The locking here is not about making the device accessible for the migration itself, that's handled earlier (pre-migration ACTIVATE_SHARED, which hard-fails). These post-migration ACTIVATE_EXCLUSIVE calls are state normalization.

Comment thread server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java Outdated
Comment thread server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java Outdated
@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

Comment thread server/src/main/java/com/cloud/storage/snapshot/SnapshotManagerImpl.java Outdated
Comment on lines +2749 to 2785
// Check if CLVM lock transfer is needed (even if moveVolumeNeeded is false)
// This handles the case where the volume is already on the correct storage pool
// but the VM is running on a different host, requiring only a lock transfer
boolean isClvmLockTransferNeeded = !moveVolumeNeeded &&
isClvmLockTransferRequired(newVolumeOnPrimaryStorage, existingVolumeOfVm, vm);

if (isClvmLockTransferNeeded) {
// CLVM lock transfer - no data copy, no pool change needed
newVolumeOnPrimaryStorage = executeLightweightLockMigration(
newVolumeOnPrimaryStorage, vm, existingVolumeOfVm,
"CLVM lock transfer", "same pool to different host");
} else if (moveVolumeNeeded) {
PrimaryDataStoreInfo primaryStore = (PrimaryDataStoreInfo)newVolumeOnPrimaryStorage.getDataStore();
if (primaryStore.isLocal()) {
throw new CloudRuntimeException(
"Failed to attach local data volume " + volumeToAttach.getName() + " to VM " + vm.getDisplayName() + " as migration of local data volume is not allowed");
}
StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId());

try {
HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId());
newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(),
volumeToAttachHyperType);
} catch (ConcurrentOperationException | StorageUnavailableException e) {
logger.debug("move volume failed", e);
throw new CloudRuntimeException("move volume failed", e);
boolean isClvmLightweightMigration = isClvmLightweightMigrationNeeded(
newVolumeOnPrimaryStorage, existingVolumeOfVm, vm);

if (isClvmLightweightMigration) {
newVolumeOnPrimaryStorage = executeLightweightLockMigration(
newVolumeOnPrimaryStorage, vm, existingVolumeOfVm,
"CLVM lightweight migration", "different pools, same VG");
} else {
StoragePoolVO vmRootVolumePool = _storagePoolDao.findById(existingVolumeOfVm.getPoolId());

try {
HypervisorType volumeToAttachHyperType = _volsDao.getHypervisorType(volumeToAttach.getId());
newVolumeOnPrimaryStorage = _volumeMgr.moveVolume(newVolumeOnPrimaryStorage, vmRootVolumePool.getDataCenterId(), vmRootVolumePool.getPodId(), vmRootVolumePool.getClusterId(),
volumeToAttachHyperType);
} catch (ConcurrentOperationException | StorageUnavailableException e) {
logger.debug("move volume failed", e);
throw new CloudRuntimeException("move volume failed", e);
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a straightfoward refactor. I would prefer having it on this one already, but its up to you.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18218

@Pearl1594 Pearl1594 force-pushed the clvm-enhancements branch from 2fb696a to f438897 Compare June 10, 2026 19:55
@blueorangutan

Copy link
Copy Markdown

[SF] Trillian test result (tid-16274)
Environment: kvm-ol8 (x2), zone: Advanced Networking with Mgmt server ol8
Total time taken: 56161 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr12617-t16274-kvm-ol8.zip
Smoke tests completed. 151 look OK, 0 have errors, 0 did not run
Only failed and skipped tests results shown below:

Test Result Time (s) Test File

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Jenkins job has been kicked to build packages. It will be bundled with no SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan

Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 18221

@Pearl1594

Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan

Copy link
Copy Markdown

@Pearl1594 a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@sonarqubecloud

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot
33.4% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube Cloud

@apache apache deleted a comment from blueorangutan Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants