test(integration): de-flake otel pre-job config test by cbartz · Pull Request #803 · canonical/github-runner-operator

cbartz · 2026-06-29T13:19:20Z

What this PR does

De-flakes test_otel_collector_endpoint_pre_job_installs_config. Instead of dispatching a quick workflow, waiting for it to complete, then SSHing into the runner, it now dispatches the long-running wait workflow and inspects the runner while its job is in progress. The config read is polled to absorb the timing of the pre-job hook write.

Why we need it

The runner is ephemeral, so its OpenStack VM is torn down on job completion. Reading the otel config after completion races the runner-manager cleanup loop — when cleanup wins, get_single_runner finds zero VMs and the test fails with found more than one runners or no runners: []. This was observed failing on the 22.04 base while passing on 24.04 in the same run (e.g. PR #802 CI). Keeping the job in progress guarantees the VM is alive during inspection.

Checklist

I followed the contributing guide
I added or updated the documentation (if applicable) — N/A, test-only change
I updated docs/changelog.md with user-relevant changes — N/A, no user-facing change
I used AI to assist with preparing this PR
I added or updated tests as needed (unit and integration)
If this is a Grafana dashboard: I added a screenshot of the dashboard — N/A
If this is Terraform: terraform fmt passes and tflint reports no errors — N/A
If the github-runner-manager application has been changed: version updated in github-runner-manager/pyproject.toml — N/A, not changed

The test dispatched a quick workflow, waited for it to complete, then SSHed into the runner to read the otel config the pre-job script wrote. Ephemeral runner VMs are torn down on job completion, so the inspection raced the runner-manager cleanup: when cleanup won, get_single_runner found zero VMs and the test failed with an empty runner list. Dispatch the long-running wait workflow and inspect the runner while its job is in progress, so the VM is guaranteed alive. Poll on the config file to absorb the pre-job hook write timing.

cbartz requested review from florentianayuwono, javierdelapuente, weiiwang01, yanksyoon and yhaliaw as code owners June 29, 2026 13:19

github-actions Bot added the Libraries: Out of sync label Jun 29, 2026

cbartz mentioned this pull request Jun 29, 2026

fix(runner-manager): clean up VMs in error state immediately #802

Merged

8 tasks

cbartz marked this pull request as draft June 29, 2026 14:04

Merge branch 'main' into fix/deflake-otel-pre-job-test

75567c1

cbartz marked this pull request as ready for review July 2, 2026 12:41

yanksyoon approved these changes Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(integration): de-flake otel pre-job config test#803

test(integration): de-flake otel pre-job config test#803
cbartz wants to merge 2 commits into
mainfrom
fix/deflake-otel-pre-job-test

cbartz commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

cbartz commented Jun 29, 2026

What this PR does

Why we need it

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants