Skip to content

fix(ci): Execute newly added or modified scheduled_only tests in pre-submits#4339

Open
mesakhcienet wants to merge 1 commit into
AI-Hypercomputer:mainfrom
CIeNET-International:fix/scheduled-tests-ci
Open

fix(ci): Execute newly added or modified scheduled_only tests in pre-submits#4339
mesakhcienet wants to merge 1 commit into
AI-Hypercomputer:mainfrom
CIeNET-International:fix/scheduled-tests-ci

Conversation

@mesakhcienet

@mesakhcienet mesakhcienet commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Description

This change makes the pre-submit execute the scheduled_only tests that a PR introduces or modifies, so each is verified at least once before merge, while unrelated scheduled_only tests remain nightly-only (unchanged).

How it works

  • A pytest collection hook in tests/conftest.py computes the set of tests a PR adds or modifies — via git diff --unified=0 origin/<base>...HEAD — and dynamically applies a new newly_added marker to the matching collected tests.
  • The two test-runner workflows change the pre-submit marker from <marker> and not scheduled_only to <marker> and (not scheduled_only or newly_added). Scheduled (nightly) runs are untouched and continue to run everything.

Detection details / robustness

  • Diff parsing lives in a small, stdlib-only, unit-tested module tests/utils/newly_added_detection.py (parse_changed_tests is pure; get_changed_tests wraps git). Keeping it out of conftest.py lets it be tested without importing absl/pytest/jax.
  • Matching is file-qualified(file_path, test_name) — so identically named tests in different files (e.g. test_basic_call appears in 6 files) do not cross-contaminate and pull unrelated scheduled_only tests into pre-submit.
  • .gitattributes sets *.py diff=python so git diff hunk headers expose the enclosing def test_* for edits inside class methods (the suite is overwhelmingly class-based). This enables detection of modified tests, not just brand-new ones.
  • The package-test checkout uses fetch-depth: 0 so the three-dot origin/main...HEAD diff has a reachable merge-base (otherwise the shallow clone forces a less precise fallback).
  • The base branch falls back to GITHUB_BASE_REF (then main), so PRs targeting a non-main branch are handled.

Files changed

File Change
tests/conftest.py Collection hook applies newly_added (file-qualified match)
tests/utils/newly_added_detection.py (new) Pure diff parser + git wrapper
tests/unit/newly_added_detection_test.py (new) Unit tests for the parser
.github/workflows/run_tests_against_package.yml Marker expression + fetch-depth: 0
.github/workflows/run_pathways_tests.yml Marker expression
.gitattributes (new) *.py diff=python
pytest.ini Register newly_added marker

Shortcomings / future work

  • Detection is a no-op when maxtext_installed=true (the runner does cd /deps, which is not a git work tree). Not on the current PR pre-submit path, but worth noting.
  • run_pathways_tests.yml checks out via a hardcoded git clone https://github.com/google/maxtext.git, which does not resolve fork build SHAs (pre-existing, out of scope here).
  • A change that only deletes lines from a test (no added lines) is not flagged as modified.

Tests

New unit tests in tests/unit/newly_added_detection_test.py (marked cpu_only) cover the diff parser across: new top-level function, new class method, modified method detected via the hunk header, cross-file name-collision isolation, non-test / src/ changes ignored, and multi-file diffs.

All commands run locally against the repo .venv (Python 3.12).

1. Unit tests

$ python3 -m pytest tests/unit/newly_added_detection_test.py -q
6 passed, 1 warning in 0.14s

2. Lint gate (same hooks as code_quality.yml)

$ pre-commit run --files tests/conftest.py tests/utils/newly_added_detection.py \
    tests/unit/newly_added_detection_test.py .gitattributes \
    .github/workflows/run_tests_against_package.yml pytest.ini
codespell....................Passed
pylint.......................Passed
pyink........................Passed
yamllint.....................Passed

3. End-to-end through the real conftest.py — a temporary scheduled_only probe test committed on the branch, collected through the actual tests/conftest.py:

# The new scheduled_only test IS rescued by the newly_added marker:
$ python3 -m pytest -m "cpu_only and (not scheduled_only or newly_added)" tests/unit/_probe_test.py -q
1 passed

# Baseline: under the old marker it would have been skipped:
$ python3 -m pytest -m "cpu_only and not scheduled_only" tests/unit/_probe_test.py -q
1 deselected

This confirms the full chain: git diff → file-qualified parse → newly_added marker → survives -m selection → the scheduled_only test runs. The .gitattributes change was separately verified to make git diff --unified=0 emit @@ ... @@ def test_column(...) headers for edits inside class methods (instead of the enclosing class ... line).

4. CI: the two modified workflows exercise this change on the actual TPU / GPU / CPU and Pathways pre-submit runners in this PR.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable. — N/A: CI-infrastructure change with no ML workload; verified via unit tests, the lint gate, an end-to-end conftest probe (above), and the modified workflows running in this PR's pre-submit.
  • I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation. — No user-facing docs affected.

@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Comment thread tests/utils/newly_added_detection.py Outdated
Comment thread tests/unit/newly_added_detection_test.py Outdated
return found


def get_changed_tests(base_ref=None):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to detect both newly added tests and the modified tests, or just new tests? @shralex

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detecting modified tests sounds great

Comment thread tests/conftest.py
changed_tests = get_changed_tests() # set[(file_path, test_name)]
if changed_tests:
for item in items:
item_file = item.nodeid.split("::", 1)[0]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item_file = os.path.abspath(item.nodeid.split("::", 1)[0])

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept it relative (reasoning in the newly_added_detection.py:185 / :191). nodeid is rootdir-relative but abspath is CWD-relative, so from a subdir it doubles the path and the match breaks. pytest.ini pins rootdir = repo root, so relative-vs-relative is already exact.

Comment thread tests/conftest.py Outdated
Comment thread tests/utils/newly_added_detection.py Outdated
Comment thread tests/utils/newly_added_detection.py
except OSError:
continue
for name in touched_test_names(source, touched_lines):
changed.add((path, name))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

full_path

@mesakhcienet mesakhcienet Jul 3, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion

I tried it but ended up keeping the repo-relative path, because abspath turned out to break the match. It resolves against the current working directory, while the conftest side (item.nodeid) is relative to the pytest rootdir. When those differ they diverge: collecting from tests/unit/ turns os.path.abspath(nodeid) into .../tests/unit/tests/unit/..._test.py, so the tuple no longer matches.

Since pytest.ini fixes rootdir = repo root = git toplevel, the git diff paths and the nodeid paths are already the same string, so the current relative compare is exact and CWD-independent. abspath would also pull in a repo_root lookup that isn't available on the maxtext_installed:true /deps runner.

I verified this empirically, but happy to revisit if there's a case where the paths don't line up, or you want us to strictly implement , thanks!

@mesakhcienet

mesakhcienet commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator Author

Thanks for the detailed review, @shralex @xibinliu — pushed an update addressing the feedback. summary of what changed:

Applied as suggested

  • Detect modified tests, not just new ones — the detector now maps each changed line (from git diff) onto the AST span of every test, so an edited test is flagged as well as a brand-new one (covers funtcions, class methods, async def, and decorator lines).
  • Wording — the newly_added marker help now reads "newly introduced or modified tests".
  • Copyright headers bumped to 2026 on both new files.

Adjusted with a small deviation (details in the threads)

  • Diff fallback strategies — adopted the three-dot (merge-base) ranges but dropped the two-dot ones. Two-dot compares tip-vs-tip and over-reports commits that landed on the base after the branch point (3 → 11 test files here, ~64 with a stale local main), which would force unrelated scheduled_only tests to run. Kept origin/{base}...HEAD and a local {base}...HEAD fallback, exctracted into _build_diff_commands() with unit tests.
  • Absolute paths for the match key — kept the match relative instead. item.nodeid is rootdir-relative while os.path.abspath() is CWD-relative, so it breaks when pytest runs from a subdirectory. pytest.ini fixes rootdir = repo root = git toplevel, so the current relative compare is already exact. Verified empirically — happy to revisit if there's a case I've missed.

Verification

  • 17 unit tests (line-range parsing + line→test mapping, incl. the modified-test cases) — all green.
  • Lint gate (codespell / pylint / pyink) — passing.
  • Real-git behavioural check + an end-to-end conftest probe confirming a scheduled_only test the PR touches now runs in pre-submit while an untouched one stays deselected.

Feel free to suggest changes. Thank you!

@mesakhcienet mesakhcienet force-pushed the fix/scheduled-tests-ci branch from 14d996d to 71b8779 Compare July 3, 2026 03:33
@mesakhcienet mesakhcienet requested review from shralex and xibinliu July 3, 2026 04:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants