Skip to content

detect_result_caching: allowlist compile-cache by RHS shape, not variable name#276

Open
vjkaruna wants to merge 2 commits into
gpu-mode:mainfrom
vjkaruna:allow-compile-cache
Open

detect_result_caching: allowlist compile-cache by RHS shape, not variable name#276
vjkaruna wants to merge 2 commits into
gpu-mode:mainfrom
vjkaruna:allow-compile-cache

Conversation

@vjkaruna

Copy link
Copy Markdown

Problem

RE_RETURN_CACHE_INDEX allowlists return <cache>[...] by variable
name — it negative-lookaheads _?(?:compiled|kernel|module|func|op) _?\w*cache. Abbreviated names like _KCACHE, _CCACHE, _TILECACHE
miss the allowlist and trigger OUTPUT_REPLAY_CACHE even when the
cache implements a vanilla JIT compile-on-first-call pattern:

_KCACHE = {}
def _get_gemm(M, N, K):
    key = ("gemm", M, N, K)
    if key not in _KCACHE:
        _KCACHE[key] = _gemm.compile(M=M, N=N, K=K, ...)
    return _KCACHE[key]

Observed on a real TileLang submission. The structurally identical
Opus submission with _KERNEL_CACHE passed.

Fix

Add RE_CACHE_STORES_COMPILED_CALLABLE matching cache stores whose
RHS contains .compile(. In detect_result_caching, suppress the
direct return from cache[...] signal when this regex matches. The
value stored in the cache distinguishes legitimate compile-cache from
output replay; checking the RHS is more robust than guessing variable
names.

Only the .compile( method-call form is allowlisted. Function-call
forms like _compile_kernel(out) are deliberately NOT allowlisted —
trivially forgeable via an identity passthrough.

Validation

python3 kernelguard.py --api-mode on each fixture:

Fixture Stock 0.3.1 Patched Expected
1. TileLang _KCACHE[key] = _gemm.compile(...) hacked / OUTPUT_REPLAY_CACHE valid valid
2. TileLang _KERNEL_CACHE (regression check) valid valid valid
3. _RESULT_CACHE[data_ptr()] = out; return ... hacked hacked hacked
4. Adversarial _CACHE[key] = _compile_passthrough(out) hacked hacked hacked
5. Plain output-cache _CACHE[key] = out hacked hacked hacked

5/5 match expected outcome.

vjkaruna added 2 commits June 22, 2026 19:25
…able name

`RE_RETURN_CACHE_INDEX` allowlists `return <cache>[...]` when the cache
variable's name matches the literal substrings `compiled|kernel|module|
func|op` (with optional leading underscore + optional trailing
underscore before `cache`). This misses abbreviated names like
`_KCACHE`, `_CCACHE`, `_TILECACHE` — even though the semantic pattern
they implement (a JIT/AOT compile-cache that returns the compiled
callable on cache hit) is identical to the allowlisted `_KERNEL_CACHE`.

Observed in the wild on a TileLang submission that was wrongly rejected
with `OUTPUT_REPLAY_CACHE` despite implementing a standard
compile-on-first-call pattern:

    _KCACHE = {}

    def _get_gemm(M, N, K):
        key = ("gemm", M, N, K)
        if key not in _KCACHE:
            _KCACHE[key] = _gemm.compile(M=M, N=N, K=K, ...)  # compile-cache
        return _KCACHE[key]

The structurally identical Opus submission with `_KERNEL_CACHE` passed.

## Fix

Add `RE_CACHE_STORES_COMPILED_CALLABLE`, matching cache stores whose
RHS contains `.compile(`:

    \w*(?:cache|reuse)\w*\s*\[[^\]]+\]\s*=\s*[^=\n]*?\.compile\s*\(

Suppress the `direct return from cache[...]` signal in
`detect_result_caching` when this pattern matches the cache_scope. The
*value* in the cache distinguishes legitimate compile-cache from output
replay; checking the RHS is more robust than guessing variable-name
prefixes.

## Why method-call form only

I deliberately did NOT allowlist function-call forms like
`_compile_kernel(out)`. That form is too easy to forge — an attacker
could write `def _compile_passthrough(x): return x` and use
`_CACHE[key] = _compile_passthrough(output)` to bypass the rule.

The `.compile(` method-call form requires the cached value to actually
have a `.compile()` attribute — PyTorch tensors do not, and a
plausible passthrough that does is harder to write convincingly. A
validation fixture covers this case (Fixture 4 below).

## Validation

`python3 kernelguard.py --api-mode` on each fixture:

| Fixture | Stock 0.3.1 | Patched | Expected |
|---|---|---|---|
| 1. Real TileLang kernel with `_KCACHE[key] = _gemm.compile(...)` | hacked / OUTPUT_REPLAY_CACHE | **valid** | valid |
| 2. Real TileLang kernel with `_KERNEL_CACHE` (regression check) | valid | **valid** | valid |
| 3. True output-cache hack: `_RESULT_CACHE[data_ptr()] = out; return ...` | hacked | **hacked** / OUTPUT_REPLAY_CACHE | hacked |
| 4. Adversarial: `_CACHE[key] = _compile_passthrough(out)` (identity helper) | hacked | **hacked** / OUTPUT_REPLAY_CACHE | hacked |
| 5. Plain output-cache (no compile call): `_CACHE[key] = out` | hacked | **hacked** / OUTPUT_REPLAY_CACHE | hacked |

5/5 fixtures match expected outcome — the fix opens up the legitimate
compile-cache shape while preserving detection of every output-replay
variant tested.
…literal decoys

The original PR matched `RE_CACHE_STORES_COMPILED_CALLABLE` against
`cache_scope`, which is `python_active` (comments stripped) or the
entrypoint body. Comments can't fool the regex, but string literals
(docstrings, module-level constants) can — an attacker can plant a
decoy `_PLAUSIBLE_CACHE[key] = my_factory.compile(...)` inside a
docstring to suppress the OUTPUT_REPLAY_CACHE detection while keeping
a real output-replay cache in the active code:

    NOTE = """
      _PLAUSIBLE_CACHE[key] = my_factory.compile(M=M, N=N)
    """
    _cache = {}
    def my_kernel(act, weight):
        key = (act.shape, weight.shape)
        if key in _cache:
            return _cache[key]      # OUTPUT_REPLAY_CACHE wrongly suppressed
        answer = act @ weight
        _cache[key] = answer
        return _cache[key]

Same bug class flagged by Cursor Bugbot on our local mirror downstream
(mirendil/mirendil#1134).

## Fix

Add `strip_python_strings_and_comments(code)` next to the existing
`strip_python_comments`. It uses `tokenize` to replace COMMENT and
STRING tokens with same-length whitespace, preserving line/column
offsets. Apply it to `cache_scope` before running the new compile-cache
regex.

Side effect: the tokenizer inserts a space between tokens so
`_gemm.compile(` becomes `_gemm . compile (`. Updated the new regex to
tolerate optional whitespace around the dot.

## Validation

`python3 kernelguard.py --api-mode` on each fixture:

| Fixture | Expected |
|---|---|
| TileLang `_KCACHE[key] = _gemm.compile(...)` (legit) | valid |
| TileLang `_KERNEL_CACHE` (regression) | valid |
| Real `_RESULT_CACHE[data_ptr()] = out` | hacked |
| `_CACHE[key] = _compile_passthrough(out)` (function-call adversarial) | hacked |
| Plain `_CACHE[key] = out` (no `.compile` anywhere) | hacked |
| Real hack + comment decoy `# _C[k]=f.compile(...)` | hacked |
| Real hack + **string decoy** `"""...=f.compile(...)..."""` (NEW) | hacked |

7/7 match.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant