Update latest-results with post-fix benchmark run by kinto0 · Pull Request #26 · microsoft/python-lsp-compare

kinto0 · 2026-04-22T00:06:07Z

Summary

Re-ran full unfiltered bench-servers after the tsp_core config fix
Filtered runs (--server/--protocol) intentionally skip updating latest-results.md, so it still showed the pre-fix Pyrefly failure
Updated latest-results.md and per-server JSON/JSONL result files
Also includes the _install_state_path() symlink fix from Fix install state path resolving symlinks to unwritable system Python #24 and the config fix from Fix tsp_core generic specialization benchmark pointing at empty line #25

Stack: Includes #24 and #25. Merge those first, then this PR's diff is just the regenerated latest-results files.

Test plan

Before: latest-results.md shows: Pyrefly | no | 221.27 | ... | 1 (tsp_core failed point)
After: latest-results.md shows: Pyrefly | yes | 316.80 | 0.17 | 8 | 40 | 100% | 0. All 4 servers benchmarked. Pyrefly: 0 failures across 8 suites. pylsp-mypy: 5 failures (pre-existing Jedi issues, unrelated).

_install_state_path() called Path.resolve() on the suite venv's python executable, which follows symlinks to the base interpreter. On systems where the base interpreter lives in a read-only location (e.g. a system Python framework at /usr/local/fbcode/), this causes a PermissionError when writing the per-suite install state file. Drop .resolve() so the state file is written relative to the venv path, which is always writable. Test plan: Before: bench-servers crashes immediately with PermissionError: [Errno 1] Operation not permitted: '/usr/local/fbcode/platform010/Python3.12.framework/... /.python-lsp-compare-install.json' After: bench-servers completes successfully across all 4 servers. Pyrefly: 1 failed point (tsp_core/generic specialization — pre-existing config bug, not caused by this change). 43 unit tests pass.

The "generic specialization computed type" benchmark point targeted line 7 (0-indexed) in generics.py, which is an empty line. The actual expression `text = identity("hello")` is on line 9. Pyrefly's getComputedType correctly returned null for the empty line, causing requireNonEmpty validation to fail on every iteration. This was the sole Pyrefly failure across all 8 benchmark suites and has been present since the original tsp_core commit (800936d). Fix: change start_line and end_line from 7 to 9. Test plan: Before: bench-servers --server pyrefly --protocol tsp shows: pyrefly: [tsp_core] generic specialization computed type failed: Result validation failed: iteration 1: empty result; iteration 1: size_chars=0 < 10 (repeated for all 5 iterations) tsp_core: failed (1 failed point out of 8) After: bench-servers --server pyrefly --protocol tsp shows: pyrefly: [tsp_core] generic specialization computed type ok tsp_core: ok (0 failed points, 8/8 pass) Pyrefly total: 0 failures across all 8 benchmark suites. 43 unit tests pass.

Re-ran full unfiltered bench-servers after the tsp_core config fix. Filtered runs (--server/--protocol) intentionally skip updating latest-results.md, so it still showed the pre-fix failure. Test plan: Before: latest-results.md shows: Pyrefly | no | 221.27 | ... | 1 (tsp_core failed point) After: latest-results.md shows: Pyrefly | yes | 316.80 | 0.17 | 8 | 40 | 100% | 0 All 4 servers benchmarked. Pyrefly: 0 failures across 8 suites. pylsp-mypy: 5 failures (pre-existing Jedi issues, unrelated).

kinto0 added 3 commits April 21, 2026 17:52

kinto0 closed this Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update latest-results with post-fix benchmark run#26

Update latest-results with post-fix benchmark run#26
kinto0 wants to merge 3 commits into
microsoft:mainfrom
kinto0:fix/update-latest-results

kinto0 commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kinto0 commented Apr 22, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant