Skip to content

chore: ruff format + region-aware lint gate#206

Draft
cailmdaley wants to merge 4 commits into
refactor/src-module-layoutfrom
chore/ruff
Draft

chore: ruff format + region-aware lint gate#206
cailmdaley wants to merge 4 commits into
refactor/src-module-layoutfrom
chore/ruff

Conversation

@cailmdaley

@cailmdaley cailmdaley commented Jun 20, 2026

Copy link
Copy Markdown
Collaborator

What this does, and why

ruff is a single fast tool that both formats and lints. Wiring it in buys three things: a consistently formatted tree, so diffs carry intent rather than whitespace churn; a static net that catches a class of real bugs before runtime (undefined names, unused imports, broken format strings); and, once the pre-commit hook is installed, a guarantee the mess doesn't come back. After this lands, every commit is auto-formatted and the library is lint-gated both locally (pre-commit) and in CI. The cleanup becomes something we did once instead of something we keep doing.

The pass

This was a large one-time pass, and it holds two kinds of work. The mechanical part is essentially one command: ruff format reflowed 220 files, and the safe autofixes sorted imports, stripped trailing whitespace, and repaired invalid escape sequences. The part that took judgment was smaller but real: resolving star-imports to explicit names in the library, dropping dead variables only where the right-hand side has no side effect, and chasing down the genuinely undefined names ruff surfaced.

The commits are separated so the format reflow doesn't bury the meaningful diff. Review per commit:

  1. style(ruff) — the repo-wide format pass (mechanical; listed in .git-blame-ignore-revs).
  2. chore(ruff) — pre-commit hooks, the CI lint.yml, and the lint policy.
  3. fix(ruff) — the lint fixes themselves, the diff worth reading.
  4. chore(ruff) — scope the CI lint to the library and broaden the analysis-script ignores.

Why the policy is region-aware

A library and a one-off analysis script don't deserve the same standard, and one global policy can't serve both. So src/sp_validation is strict, with zero ignores, while the peripheral trees (workflow/scripts, papers/*/scripts, the per-paper analysis scripts, scripts/, scratch/, cosmo_inference/) are formatted, autofixed, and have their real bugs fixed, but waive a few rules that would otherwise be pure noise there.

To stay conservative, we ignore these, and only outside the library:

  • E402 (import not at top of file): these scripts manipulate sys.path and set a matplotlib backend before importing, so the imports genuinely can't move up.
  • F403 / F405 (from x import * and the names it supplies): interactive analysis scripts lean on star-imports on purpose; rewriting dozens of throwaway files to explicit imports is churn with no payoff.
  • E501 (line too long), ignored everywhere including the library: the formatter already owns line width, and what's left is unbreakable strings and URLs. This is ruff's own recommendation once you run the formatter.

The genuine bug-classes stay enforced everywhere: undefined names, broken format strings, bare excepts, dead variables. Only the structurally intentional patterns get waved through, and only in code that isn't the library.

What it already caught

Three real bugs, none of them cosmetic:

  • A file that didn't parse. papers/catalog/hist_mag.py carried a syntax error and would have failed on import; it's fixed, and the whole tree now compiles.
  • A latent NameError in the library: plots.py referenced room_ra, a typo for zoom_ra, so any call down that path would have raised. Fixed.
  • Five undefined names ruff can flag but not fix, since they need an author who knows the intent. They're left visible in the advisory check: cov_sim_gaussian, never defined, at validation_cov_cell.py:876 and :881; and index (twice) plus n_contours in the glass-mock sampling notebook.

That ruff check src/sp_validation is now fully clean is itself the proof that the star-import resolution was complete. Any name the star-import had been quietly supplying would now surface as an undefined-name error, and none do.

Open questions

How far to push region-aware. The current line is "library strict, everything else format-plus-real-bugs only." We could tighten it (lint-enforce the workflow/ rules, or the per-paper scripts) or relax it further. This is a starting point, not a conviction; the right balance depends on how the repo is actually worked.

Verification

ruff check src/sp_validation clean; ruff format --check . clean across 220 files; all 190 tracked .py compile; the test suite passes. CI on push is the authoritative gate.

— Claude on behalf of Cail

…aper analysis scripts

The gate revealed peripheral residual the per-file-ignore globs didn't reach
(analysis scripts under papers/<paper>/*.py, cosmo_inference notebooks) plus the
Snakemake-injected `snakemake` global (F821). Region-aware decision: enforce lint
on src/ only (matching the pre-commit hook), keep format repo-wide, run the
repo-wide lint advisory. src/ stays pristine (ruff check src/ => clean).
@review-notebook-app

Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@cailmdaley

Copy link
Copy Markdown
Collaborator Author

Sasha and i have decided on this project: we want the pre-commit hook to issue a warning — not a block — for all checks we run in ci. all lint checks should run at commit time too, but only produce warnings.

in ci, we want any lint failures to automatically open an issue tagging the person who made the commit. this way, work isn't blocked, but there's a record of what's wrong. the person who committed it also has to fix it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant