Pre-warm cold OptimizedDirectorySourceLocator caches before spawning workers#5846
Pre-warm cold OptimizedDirectorySourceLocator caches before spawning workers#5846SanderMuller wants to merge 1 commit into
Conversation
32188a1 to
aaf28cf
Compare
ondrejmirtes
left a comment
There was a problem hiding this comment.
Please only contribute the file hash algorithm change.
Source locator caches are already prewarmed, feel free to verify my claims, but the way it works today is that the main thread create a file on disk for the most expensive locator (OptimizedDirectorySourceLocator) and the child processes only read that which pretty fast.
| && function_exists('pcntl_waitpid') | ||
| && !$this->isOpcacheEnabled() | ||
| ) { | ||
| return $this->hashAndFindSymbolsParallel($filesWithCachedHashes, $supportsEnums); |
There was a problem hiding this comment.
I don't think this is worth it. I don't want to have yet-another parallel mechanism hidden inside here. I think the gains from this would pretty minimal.
| $hashes = []; | ||
| foreach ($this->allConfigFiles as $file) { | ||
| $hash = hash_file('sha256', $file); | ||
| $hash = hash_file('xxh128', $file); |
There was a problem hiding this comment.
I'd welcome this but it'd be nice to have a new FileHasher class (akin to FileReader) and we'd have to switch the hash based on PHP_VERSION_ID (This is available only since 80100).
aaf28cf to
61c3e2b
Compare
|
I took the invitation to verify seriously — both directions. Three changes to the PR based on what I found:
On the "source locator caches are already prewarmed" claim — I instrumented So on a cold cache the main thread does not create the cache file first — all 8 workers race the same scans (hash + symbol extraction of ~6,800 files each, including the autoload-dev End-to-end, measured with hyperfine (cold, tmpDir wiped per run,
So the honest end-to-end story in default mode: no wall-time win (the duplicated scans overlap across workers), but ~15% less total CPU per cold run — which matters for billed CI minutes and contended runners. With forked workers it is a wall win too. If a CPU-only improvement in the default mode doesn't clear the bar for you, I'm fine closing this; the trace above might still be useful as a data point about cold-run behaviour. |
|
This goes against the latest fixes we did. This will actually perform much worse in some cases because typically the main thread doesn't need to initialize the source locator at all. See #5577. It'd make sense to still initialize just the OptimizedDirectorySourceLocator, so that child workers save time. |
Will ook into that case, thanks for the reference |
Building the source locator eagerly scans the analysed directories and composer classmap directories and writes the shared file cache. Without this, every parallel worker redoes the same symbol scan against a cold cache: measured 8x duplicated scanning CPU at 8 workers, and removing this pre-warm costs +51% wall / +86% CPU on a 14-core cold run of src/Type (with PHPSTAN_PARALLEL_FORK=1, forked workers additionally inherit the warm in-memory state via copy-on-write). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
61c3e2b to
11d3e17
Compare
|
Thanks — I read #5577 and phpstan/phpstan#14072 properly now, and I see the conflict: my version forced the full Reworked as you suggested, with one refinement: the pre-warm now touches only the
The cold numbers on this corpus are modest because |
|
Adding
Warm: identical — the cold-probe costs nothing measurable. Cold: wall is also identical (the duplicated worker scans overlap in time), but user CPU drops 128.1 s → 123.8 s (−3.4%). So on this machine the spawn-mode benefit is purely CPU; the wall-time win only shows up with |
What & why
On a cold cache, every parallel worker independently rebuilds the directory source
locators for the analysed paths: instrumenting
OptimizedDirectorySourceLocatorFactoryon current 2.2.x shows all 8 workers scanning the analysed directory on a cold run while
the main process never does.
This PR fills the
OptimizedDirectorySourceLocatorsymbol cache for analysed directoriesin the main thread before spawning workers — but only for directories that have no
cache entry yet. Directories with an existing entry are skipped with a single cache
read, workers validate those themselves exactly as today, and nothing else of the source
locator is initialized (no identifier is located, no composer processing, no stubbers).
That keeps the main thread as lazy as #5577 made it: fully warm runs spawn no workers and
never reach this code, and incremental runs pay one cache read.
Benchmarks
hyperfine, self-analysis of
src/Typeat level 8, M4 Pro / PHP 8.5.7, base = ff2647a(cold = tmpDir wiped per run; incremental = one file modified per run, caches warm):
PHPSTAN_PARALLEL_FORK=1The cold CPU savings on this corpus are modest because the analysed directory
(
src/Type) is small relative to the unanalysed classmap directories, which this PRdeliberately leaves alone. On projects where the analysed paths cover most of the
scanned tree, the deduplicated share grows accordingly.
Tests
make tests(12,711, green),make phpstan,make cs,make lintandmake composer-dependency-analyserall pass. Analysis output is byte-identical. Theresult-cache-restore-without-reflectione2e (ThrowingSourceLocator) passes — thepre-warm never locates an identifier.
🤖 Generated with Claude Code