Skip to content

ci(cli): add Windows x64 build verification to PR and release workflows #1857

@jeffmaury

Description

@jeffmaury

Problem Statement

The OpenShell CLI currently builds and tests only on Linux (amd64/arm64) and macOS (arm64) platforms. There is no verification that the CLI builds successfully for Windows x64 during pull request validation or release processes. This creates a risk that Windows support could regress without detection, and prevents Windows users from accessing pre-built CLI binaries.

This spike investigates what's needed to add Windows x64 build verification to both PR checks and release workflows, ensuring the CLI can be built and distributed for Windows users.

Technical Context

The current CI/CD pipeline uses:

  • PR validation (branch-checks.yml): Runs lint and tests on Linux runners only
  • Release builds (release-tag.yml, release-dev.yml): Produces Linux musl binaries (via Zig cross-compilation) and macOS binaries (via osxcross in Docker)
  • Python packaging (pyproject.toml): Declares support for Linux and macOS only

The CLI crate has Unix-specific dependencies (primarily the nix crate) used for signal handling, process execution, and terminal control. There is no existing Windows build configuration or cross-compilation setup for Windows targets.

Affected Components

Component Key Files Role
PR validation workflow .github/workflows/branch-checks.yml (lines 80-136) Runs Rust lint/test on Linux only; needs Windows job
Release workflows .github/workflows/release-tag.yml (lines 239-409), release-dev.yml (lines 208-376) Build CLI for Linux musl + macOS; needs Windows build job
CLI crate crates/openshell-cli/Cargo.toml (line 75) Depends on Unix-only nix crate; needs conditional compilation
CLI signal handling crates/openshell-cli/src/ssh.rs, src/run.rs 17 #[cfg(unix)] blocks for signals; needs Windows stubs
Python packaging pyproject.toml (lines 36-37) Declares Linux/macOS only; optional Windows wheel support

Technical Investigation

Architecture Overview

Current Build Pipeline:

  1. PR checks: branch-checks.yml triggers on pull-request/* branches, runs Rust job on linux-amd64-cpu8 and linux-arm64-cpu8 self-hosted runners inside the ghcr.io/nvidia/openshell/ci:latest Linux container.

  2. Release builds:

    • build-cli-linux job uses native Linux runners with Zig toolchain for musl cross-compilation (targets: x86_64-unknown-linux-musl, aarch64-unknown-linux-musl)
    • build-cli-macos job runs on Linux with Docker, using osxcross for aarch64-apple-darwin target
    • Artifacts packaged as .tar.gz and uploaded to GitHub releases
  3. Dependencies:

    • bundled-z3 feature: statically links Z3 solver (C++ dependency)
    • nix crate (workspace dependency, line 66 in root Cargo.toml): provides Unix signal handling, process APIs, file descriptor manipulation

Current Platform-Specific Code:

  • ssh.rs: 13 #[cfg(unix)] blocks for signal handling (SIGINT, SIGQUIT, SIGTERM, SIGWINCH), CommandExt::exec() (replace process image), and symlink handling
  • run.rs: 4 #[cfg(unix)] blocks for stdin thread spawn and terminal resize (SIGWINCH) handling
  • auth.rs: Already has Windows support for browser opening (cmd /C start)

Code References

Location Description
.github/workflows/branch-checks.yml:84-136 Rust job matrix - currently Linux amd64/arm64 only
.github/workflows/release-tag.yml:239-346 build-cli-linux job - Linux musl cross-compilation
.github/workflows/release-tag.yml:351-409 build-cli-macos job - macOS cross-compilation via Docker + osxcross
crates/openshell-cli/Cargo.toml:75 nix dependency - Unix-only crate
crates/openshell-cli/src/ssh.rs:* 13 Unix-specific #[cfg(unix)] blocks for signal handling
crates/openshell-cli/src/run.rs:* 4 Unix-specific blocks for SIGWINCH resize handling
Cargo.toml:66 Workspace nix dependency with Unix-specific features
deploy/docker/Dockerfile.cli-macos Example cross-compilation setup using osxcross
pyproject.toml:36-37 Python classifier declaring Linux/macOS support only

Current Behavior

PR validation:

  • Lints and tests CLI on Linux amd64/arm64 runners
  • No Windows build verification
  • Uses Linux container with mise, sccache, and other dev tools

Release process:

  • Builds Linux musl binaries (static, portable) for amd64/arm64
  • Builds macOS arm64 binary via cross-compilation in Docker
  • No Windows binary produced
  • Python wheels only for Linux (amd64/arm64) and macOS (arm64)

Build constraints:

  • nix crate will fail to compile on Windows targets
  • Signal handling code (SIGINT, SIGWINCH, etc.) does not exist on Windows
  • CommandExt::exec() (replace process image) is a Unix-only syscall - no Windows equivalent

What Would Need to Change

1. PR Validation Workflow (.github/workflows/branch-checks.yml)

Add Windows to the Rust job matrix (lines 84-136):

matrix:
  include:
    - runner: linux-amd64-cpu8
      container: ghcr.io/nvidia/openshell/ci:latest
    - runner: linux-arm64-cpu8
      container: ghcr.io/nvidia/openshell/ci:latest
    - runner: windows-latest        # New
      container: null                # Windows doesn't use Linux containers

Windows-specific steps:

  • Install Rust via rustup-init.exe (cannot use mise on Windows)
  • Skip container-based tooling (sccache, mise)
  • Build with cargo build --release -p openshell-cli --target x86_64-pc-windows-msvc --features bundled-z3
  • Run smoke test: openshell.exe --version

2. Release Workflows (.github/workflows/release-tag.yml, release-dev.yml)

Add new build-cli-windows job after build-cli-macos (around line 410 in release-tag.yml):

build-cli-windows:
  name: Build CLI (Windows x64)
  runs-on: windows-latest
  steps:
    - uses: actions/checkout@v4
    - name: Install Rust
      uses: dtolnay/rust-toolchain@stable
      with:
        targets: x86_64-pc-windows-msvc
    - name: Build CLI
      run: cargo build --release -p openshell-cli --target x86_64-pc-windows-msvc --features bundled-z3
    - name: Package binary
      run: |
        mkdir artifacts
        tar -czf artifacts/openshell-x86_64-pc-windows-msvc.tar.gz -C target/x86_64-pc-windows-msvc/release openshell.exe
    - uses: actions/upload-artifact@v4
      with:
        name: cli-windows-x64
        path: artifacts/openshell-x86_64-pc-windows-msvc.tar.gz

Update release job dependencies to include build-cli-windows and add Windows artifact to checksum generation and GitHub release upload (lines 906-928).

3. CLI Source Code

Make nix dependency conditional (crates/openshell-cli/Cargo.toml):

[target.'cfg(unix)'.dependencies]
nix = { workspace = true }

[target.'cfg(windows)'.dependencies]
windows-sys = { version = "0.52", features = ["Win32_System_Console"] }  # Optional, if using Win32 APIs

Wrap Unix-specific code in ssh.rs and run.rs:

#[cfg(unix)]
fn setup_signal_handlers() {
    // Existing nix-based signal handling
}

#[cfg(windows)]
fn setup_signal_handlers() {
    // Stub or Windows equivalent (e.g., SetConsoleCtrlHandler)
    warn!("Signal handling not fully supported on Windows");
}

4. Python Packaging (Optional)

Update pyproject.toml to declare Windows support (line 37):

classifiers = [
    "Operating System :: POSIX :: Linux",
    "Operating System :: MacOS",
    "Operating System :: Microsoft :: Windows",  # New
]

Add Windows wheel build task to tasks/python.toml:

[tasks."python:build:windows"]
run = "maturin build --release --target x86_64-pc-windows-msvc -o dist/"

Alternative Approaches Considered

Alternative 1: Docker-based Cross-Compilation (like macOS)

Use MinGW-w64 toolchain in Docker to cross-compile Windows binary from Linux.

Pros:

  • Consistent with macOS build pattern
  • Uses existing self-hosted Linux runners
  • Easier to reproduce locally via Docker

Cons:

  • MinGW toolchain complexity (C++ dependencies, potential ABI issues)
  • Harder to debug than native Windows build
  • May have compatibility issues with bundled-z3 (MSVC vs MinGW)

Decision: Start with native Windows runner (simpler), revisit cross-compilation if Windows becomes a primary platform.

Alternative 2: Defer Windows Support Until User Demand

Wait for users to request Windows binaries before adding CI.

Pros:

  • Avoids premature optimization
  • Focuses effort on proven needs

Cons:

  • Doesn't address stated requirement
  • Creates regression risk if Windows support exists but is untested

Decision: Implement verification now per user request.

Alternative 3: Full Feature Parity vs Build-Only Verification

Should Windows CLI have full functionality (SSH, signals) or just basic commands?

Pros of full parity:

  • Better user experience
  • Avoids "second-class citizen" perception

Cons of full parity:

  • Significantly more work (2-4 weeks vs 2-3 days)
  • Requires Windows expertise and testing infrastructure
  • User request only asks for "build generation verification"

Decision: Start with build verification and smoke tests (--version). Document limitations. Add feature parity in follow-on work if users report issues.

Patterns to Follow

Cross-compilation precedent:

  • Dockerfile.cli-macos shows how to add new platform builds via Docker
  • Linux musl builds show how to use Rust target triples (x86_64-unknown-linux-musl) and static linking (bundled-z3)

Platform-specific code:

  • auth.rs already uses #[cfg(target_os = "windows")] for browser opening
  • Establish pattern: make nix dependency conditional, provide Windows stubs or equivalents

Artifact packaging:

  • Existing .tar.gz packaging (lines 906-928 in release-tag.yml) works for Windows binaries with minimal changes
  • Checksum generation and GitHub release upload already handle multiple artifacts

Proposed Approach

Add Windows x64 build verification in two phases:

Phase 1: PR Validation

  • Add windows-latest runner to branch-checks.yml Rust job matrix
  • Install Rust via dtolnay/rust-toolchain action
  • Build CLI with cargo build --release -p openshell-cli --target x86_64-pc-windows-msvc
  • Run smoke test: openshell.exe --version
  • Make nix dependency conditional (#[cfg(unix)])
  • Stub out Unix-specific signal handling in ssh.rs and run.rs with Windows no-op implementations

Phase 2: Release Builds

  • Add build-cli-windows job to release-tag.yml and release-dev.yml
  • Package Windows binary as .tar.gz (matches Linux musl binaries)
  • Add artifact to checksum generation and GitHub release upload
  • Update README to note Windows x64 CLI availability with documented limitations

Out of scope (defer to follow-on work):

  • Python wheels for Windows
  • Full signal handling parity on Windows
  • Windows-specific SSH connection lifecycle testing

Scope Assessment

  • Complexity: Medium
  • Confidence: High — clear path forward, proven patterns exist
  • Estimated files to change: 6-8
    • .github/workflows/branch-checks.yml
    • .github/workflows/release-tag.yml
    • .github/workflows/release-dev.yml
    • crates/openshell-cli/Cargo.toml
    • crates/openshell-cli/src/ssh.rs
    • crates/openshell-cli/src/run.rs
    • Cargo.toml (optional: add windows-sys to workspace dependencies)
    • README.md (document Windows support + limitations)
  • Issue type: ci (primary), feat (secondary)

Risks & Open Questions

Critical Risks:

  • nix crate dependency: CLI will not compile on Windows without conditional compilation. Must wrap all signal handling in #[cfg(unix)] and provide Windows stubs.
  • CWE-252 (Unchecked Return Value): Windows signal equivalents (e.g., SetConsoleCtrlHandler) have different error semantics than Unix signals. Missing error checks could cause silent failures.
  • Behavior divergence: SSH connection lifecycle differs on Windows (no exec() syscall). The CLI may require fallback to long-lived process + manual cleanup.

High Risks:

  • Supply chain risk: Cross-compiling Z3 for Windows (bundled-z3) adds MSVC or MinGW toolchain dependencies not currently audited.
  • Maintenance burden: Windows builds add a third platform to maintain, test, and debug. Need to establish who owns Windows support.

Open Questions:

  1. Should Windows CLI support SSH connections? Current implementation uses Unix-specific exec() to replace process image. Windows alternative would be long-lived proxy process with manual cleanup.
  2. What is acceptable feature parity level? Request asks for "build generation verification" but doesn't specify whether openshell sandbox connect should work on Windows.
  3. Who will test and debug Windows-specific issues? Need Windows expertise on the team or in community.
  4. Should we add Windows to e2e test suite? Current e2e tests assume Linux. Windows gateway deployment may not be supported.

Design Decisions Needed:

  1. Native Windows runner vs MinGW cross-compilation? Recommendation: Start with native runner for simplicity.
  2. Scope of Windows feature support? Recommendation: Build verification only (compile + --version), document limitations, add features in follow-on work.
  3. Python wheel packaging? Recommendation: Defer to follow-on work, release binary tarball only.

Test Considerations

Testing strategy:

  • Unit tests: Existing CLI tests should pass on Windows (or be skipped if Unix-specific)
  • Smoke tests: Run openshell.exe --version in CI to verify binary is executable
  • Integration tests: Defer Windows-specific SSH/sandbox tests to follow-on work
  • E2E tests: Current e2e suite assumes Linux gateway; Windows client-only testing initially

Test infrastructure needs:

  • GitHub Actions windows-latest runner (no additional setup needed)
  • Windows smoke test job in release workflow to verify binary is not dynamically linked against unexpected DLLs
  • Consider adding #[cfg_attr(windows, ignore)] to Unix-specific tests

Existing test patterns:

  • CLI tests in crates/openshell-cli/tests/ should be reviewed for Unix assumptions
  • Some tests may need #[cfg(unix)] guards or Windows equivalents

Test coverage gaps:

  • No current Windows-specific test coverage
  • No validation that stubbed functionality (signal handling) behaves acceptably on Windows
  • No Windows gateway deployment testing (likely not a target platform for gateway)

Documentation Impact

  • README.md: Add Windows x64 to supported platforms list, note limitations (no signal handling, SSH may have reduced functionality)
  • docs/: Add Windows installation instructions (download binary, extract, add to PATH)
  • architecture/: No impact (build infrastructure, not runtime architecture)
  • docs/reference/gateway-config.mdx: No impact (CLI builds don't affect gateway config)

Created by spike investigation. Use build-from-issue to plan and implement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions