Skip to content

docs: clarify global immutable-OS install (kubeadm config source, boot SSH key, bootstrap teardown)#113

Open
chinameok wants to merge 13 commits into
masterfrom
docs/global-install-no-ui-clarity
Open

docs: clarify global immutable-OS install (kubeadm config source, boot SSH key, bootstrap teardown)#113
chinameok wants to merge 13 commits into
masterfrom
docs/global-install-no-ui-clarity

Conversation

@chinameok

@chinameok chinameok commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Why

Make docs/en/global/install.mdx sufficient for another person to reproduce a fully no-UI global install on Huawei DCS (CLI/API only). Gaps were found by actually doing that install and diffing against the docs.

Changes (docs/en/global/install.mdx)

Clarity / correctness

  1. Define the source of the kubeadm config in Step 4 (the create-cluster Complete KubeadmControlPlane appendix + dcs-kubernetes-<ver>-files Secret), instead of the undefined "release manifest".
  2. Restate the ignition-required boot user / SSH key in the global DCS requirements.
  3. Decommission section: bootstrap teardown + warning that kubectl delete cluster global cascades into deleting the live control-plane VMs.

Complete worked example
4. New "Worked Example: Complete global Manifest for Huawei DCS" — one copy-pasteable file (Secret, DCSIpHostnamePool, DCSMachineTemplate, KubeadmControlPlane, DCSCluster, Cluster) with the global-specific annotations that were missing (is-global, cluster-type, os-family, kube-ovn-version, kube-ovn-join-cidr, registry-address) and a "Values to Replace" table. Sanitized to RFC 5737 IPs / placeholders.

Operational gaps recovered from a deploy runbook
5. DCS credential Secret migration — confirmed against cpaas-installer code (installer_dcs.go dcsImportDCSCredentialSecret): the installer auto-migrates the credential Secret to the global cluster only when it is named ait-credential-secret (Secrets are excluded from the etcdctl resource migration). The worked example now uses that name; a Decommission note tells anyone using a different name to copy it manually, else the global DCS provider has no credentials and can't reconcile (scale-out fails).
6. Bootstrap NAT stall — Common Stalls row: stopping the host firewall after KIND starts can flush the KIND bridge SNAT masquerade → CAPI controllers in KIND can't reach the new control-plane subnet → KCP stuck EtcdClusterHealthy=Unknown, installer hangs. Fix: re-add the 172.18.0.0/16 masquerade rule.

Inclusive terminology
7. mastercp in example identifiers across the page; kept the functional kube-ovn/role=master label (commented do not rename).

Still deliberately out of scope

  • OS-template ↔ provider version pairing and the version-gated os-family semantics (KubeOS must set kubeos or the node won't boot) — owned separately by the docs owner; the worked example carries the os-family field but not the version-gated rule.
  • Full DCS REST API operational recipes, qcow2 template upload, env-specific values — agent-runbook material, not customer docs.

mask format was already standardized on master by #110.

Validation

Each push validated with yarn install + yarn lint (0 errors) + yarn build in a scratch clone (the in-repo /workspaces volume is too small for node_modules).

Summary by CodeRabbit

Documentation

  • Updated Huawei DCS “global” installation guidance to require a non-empty boot user sshAuthorizedKeys list and clarified ignition behavior when the SSH key list is empty.
  • Strengthened requirements that non-encryption kubeadm/kubelet/audit/installer configuration fragments match the workload-cluster appendix (or sourced from the appropriate files Secret), with guidance on layering only global-specific fields.
  • Refreshed the worked global manifest and provider wiring/template references.
  • Added a “Decommission the Bootstrap Host” section, including warnings not to delete Cluster API/provider objects and steps to migrate non-default credential secret names to the global cluster first.

…t SSH key, bootstrap teardown)

Gaps found while reproducing a no-UI `global` install on Huawei DCS:

- Step 4 told you to "keep the release manifest's" kubeadm files without
  defining what/where that manifest is. Point to the concrete source: the
  Complete KubeadmControlPlane Configuration appendix in the DCS
  create-cluster guide (or the dcs-kubernetes-<major.minor>-files Secret).
- The ignition-required `boot` user / non-empty sshAuthorizedKeys was stated
  in the create-cluster guide but not restated in the global DCS requirements,
  so a manifest assembled from the thin fragment can omit it and fail.
- Added a Decommission step plus a warning that `kubectl delete cluster global`
  cascades into deleting the live control-plane VMs.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Documentation-only updates to docs/en/global/install.mdx tighten Huawei DCS manifest requirements, rename control-plane machine template references across provider examples, and add bootstrap host decommission guidance with worked-example SSH key updates.

Changes

Global Install Documentation

Layer / File(s) Summary
DCS manifest constraints
docs/en/global/install.mdx
Requires a non-empty sshAuthorizedKeys list for the boot user and states that non-encryption kubeadm, kubelet, audit, and installer inputs must match the workload-cluster appendix or come from the dcs-kubernetes-<major.minor>-files Secret.
Provider template name updates
docs/en/global/install.mdx
Changes the referenced control-plane machine template names in the Huawei DCS, VMware vSphere, and Huawei Cloud Stack global wiring fragments to global-cp-template or global-cp-machine-template.
Bootstrap decommission and worked example
docs/en/global/install.mdx
Adds the “Decommission the Bootstrap Host” section, limits cleanup to the local minialauda KIND cluster and network, adds the DCS credential Secret migration note, updates the Huawei DCS complete manifest example with boot.sshAuthorizedKeys, and states that ignition rejects an empty SSH key list.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

  • alauda/immutable-infra-docs#62: Introduces the global install workflow in the same documentation area that this PR extends with tighter manifest and decommission guidance.
  • alauda/immutable-infra-docs#67: Modifies the same docs/en/global/install.mdx manifest and bootstrapping guidance that this PR further constrains and expands.
  • alauda/immutable-infra-docs#73: Updates global control-plane naming and wiring patterns that this PR continues for multiple provider examples.

Poem

🐇 Hop, hop, the guide is bright,
Keys must be set, and set just right.
The bootstrap nest can now be stored,
While CAPI stays untouched and गौर?

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main documentation changes around global immutable-OS install requirements and teardown guidance.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch docs/global-install-no-ui-clarity

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

chinameok and others added 2 commits June 16, 2026 11:42
Readers previously had to assemble the global DCS manifest from a differential
fragment plus the create-cluster appendix plus the infrastructure page. Add a
"Worked Example" section with a complete, copy-pasteable manifest (Secret,
DCSIpHostnamePool, DCSMachineTemplate, KubeadmControlPlane, DCSCluster, Cluster)
including the global-specific annotations (is-global, cluster-type,
os-family, kube-ovn-version, kube-ovn-join-cidr, registry-address) and a
"Values to Replace" table, linked from Step 4.

Derived from a real no-UI DCS global install; sanitized to RFC5737 example IPs
and placeholders. The three large kubeadm files use the dcs-kubernetes-<ver>-files
Secret with an inline-from-appendix fallback. Non-DR (no encryption-provider.conf).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The worked-example and decommission sections linked to #verification and
#step-1/4/5 anchors, but those headings carry no explicit {#id}, so doom lint
flags them as unmatched. Reference those sections as plain text instead, matching
the page's existing style. Verified with yarn lint (0 errors) and yarn build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 16, 2026

Copy link
Copy Markdown

Deploying alauda-immutable-infra with  Cloudflare Pages  Cloudflare Pages

Latest commit: 2d42ccc
Status: ✅  Deploy successful!
Preview URL: https://895ebf8d.alauda-immutable-infra.pages.dev
Branch Preview URL: https://docs-global-install-no-ui-cl.alauda-immutable-infra.pages.dev

View logs

chinameok and others added 10 commits June 16, 2026 15:01
Rename the example resource names, hostnames, and machineNames from
global-master-* to global-cp-* across the page (worked example plus the
Step 4 fragments) to follow current Kubernetes inclusive terminology.

The kube-ovn/role=master node label is left unchanged because it is a
kube-ovn-recognized value; an inline comment marks it as do-not-rename.
Verified with yarn lint (0 errors) and yarn build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rade stall

#4 (confirmed against cpaas-installer code): the installer auto-migrates the DCS
credential Secret to the global cluster ONLY when it is named ait-credential-secret
(installer_dcs.go dcsImportDCSCredentialSecret, hardcoded name; Secrets are excluded
from the etcdctl resource migration). Name the worked-example Secret
ait-credential-secret so it is carried over, and add a Decommission note: if the
credential Secret has a different name, copy it to the global cluster manually or
the DCS provider there cannot reconcile (e.g. scale-out fails).

#2: add a Common Stalls row for the silent installer hang where stopping the host
firewall after KIND starts flushes the KIND bridge SNAT masquerade, so the CAPI
controllers in KIND cannot reach the new control-plane subnet (KCP stuck
EtcdClusterHealthy=Unknown). Fix: re-add the 172.18.0.0/16 masquerade rule.

Verified with yarn lint (0 errors) and yarn build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…stall, placeholder join CIDR

- Remove the bootstrap firewalld/masquerade Common Stalls row: the KIND host
  should simply run with firewalld stopped, so the flush-on-stop failure does
  not arise; the row presupposed a step the doc never instructs and was env-specific.
- /var/cpaas was wrongly placed as a DCSMachineTemplate disk. It holds platform
  state that must survive node replacement, so per Infrastructure Resources it
  must be a DCSIpHostnamePool.spec.pool[].persistentDisk. Move it to the pool
  (one per control-plane IP slot) and drop the template-disk entry.
- cpaas.io/kube-ovn-join-cidr was a hardcoded 100.5.0.0/16; it is an
  operator-chosen value, so use a <kube-ovn-join-cidr> placeholder and document
  it in Values to Replace. Audited the page: no other manifest-body literals
  that should be placeholders (Step 1 CIDR exports are intentional).

Verified with yarn lint (0 errors) and yarn build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Audited the worked-example KCP against the Complete KubeadmControlPlane
Configuration appendix and the Cluster annotations table in the create-cluster
guide. Three required fields had been dropped when hand-writing the example:

- apiServer.extraArgs.tls-cipher-suites (security hardening; present in the
  canonical KCP and in the deployed manifest).
- Cluster annotation capi.cpaas.io/kubernetes (= KubeadmControlPlane.spec.version).
- Cluster annotation cpaas.io/nodes-mode: self-managed (CAPI-managed lifecycle).

Intentional omissions kept and already documented: encryption-provider.conf and
its apiServer arg (this example is non-DR; see the DR note), and node-ip: NODE_IP
in joinConfiguration (the provider assigns the node IP).

Verified with yarn lint (0 errors) and yarn build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…its body

Per review: maintaining a second copy of the ~150-line kubeadmConfigSpec in the
worked example is drift-prone (it had already dropped tls-cipher-suites). Replace
the inlined files/clusterConfiguration/pre·postKubeadmCommands/init·joinConfiguration
with a reference to the canonical Complete KubeadmControlPlane Configuration
appendix plus the two global / non-DR deltas (etcd serverCertSANs; omit
encryption-provider.conf for non-DR). Single source of truth, no dual maintenance.

Verified with yarn lint (0 errors).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…example

Add a numbered prepare-first checklist before the manifest so readers do not miss
the platform-side prerequisites: (1) DCS API access + site, (2) Alauda OS VM
template, (3) DCS placement objects (compute cluster / dvSwitch / port group /
datastore), (4) control-plane IPs + API load balancer, (5) versions and IDs to
read. Each item maps to the manifest field it fills and links to the authoritative
infrastructure / create-cluster page.

Verified with yarn lint (0 errors).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…import ConfigMap)

Per dev clarification + re-reading cpaas-installer origin/master: the credential
Secret auto-syncs by name-match through the dcs-import-extra-resources ConfigMap
for all immutable providers (DCS included), not only when named
ait-credential-secret. ait-credential-secret is just the built-in default the
installer also tries (dcsImportDCSCredentialSecret hardcodes it and skips if
absent — no in-function fallback). Reframe the note: the Secret is copied during
install when named ait-credential-secret (no ConfigMap) OR when listed by name in
the dcs-import-extra-resources ConfigMap (same mechanism as vSphere/HCS); verify
before teardown. Drop the misleading 'only ait-credential-secret / else copy
manually' wording.

Verified with yarn lint (0 errors).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…xtra-resources ConfigMap

Do not expose the installer's internal default Secret name in the product doc.
Use the same standard ConfigMap migration as VMware vSphere / Huawei Cloud Stack
for DCS: the worked-example Secret is named global-dcs-credentials and imported
via the dcs-import-extra-resources ConfigMap (Step 7 gets a DCS ConfigMap with
just the credential Secret entry, since DCS provider CRs migrate built-in). Step 7
intro and the Decommission note updated accordingly; the internal default name is
removed from the doc.

Verified with yarn lint (0 errors).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…lder

Terminology sweep across the global install page for consistency:
- the temporary management cluster -> 'bootstrap cluster' (CAPI-standard term; a
  KIND cluster named minialauda, introduced once in Step 2)
- the machine that runs it -> 'bootstrap host' (was the inconsistent mix of
  'KIND host' across the original page and 'bootstrap host' I had added)
- KIND is now kept only as the implementation detail at first mention and in the
  teardown (KIND container/network)
- worked-example credential Secret uses the standard <auth-secret-name>
  placeholder (matches create-cluster / infrastructure pages) instead of a
  coined concrete name
- drop the vague 'DCS placement objects' label; list compute cluster /
  distributed virtual switch / port group / datastore explicitly

Verified with yarn lint (0 errors).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…gistry, auto-rewritten to the permanent registry post-install

Confirmed against code: cpaas.io/registry-address starts as the bootstrap host's
registry (<bootstrap-host-ip>:11443) — the value provisioned nodes pull from, and
the DCS provider reads it for kube-proxy/CoreDNS/kube-ovn image repositories
(cluster-api-provider-dcs dcscluster_controller.go). After the global registry is
deployed, cpaas-installer rewrites the annotation on the Cluster and DCSCluster to
the permanent platform registry (dcsUpdateClusterRegistryAnnotations in
installer_dcs.go), so later reconciles use the global cluster's registry. The doc
previously omitted this auto-rewrite; Step 1 now states it.

Verified with yarn lint (0 errors).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant