fix(docs-vec): make the hybrid driver buildable and runnable on stock PHP#130
Merged
Conversation
… PHP The docs-vec audit had flagged six defects and a presumed structural wall (PDO cannot load extensions). The wall was a wrong API call, not a platform limit: Pdo\Sqlite::loadExtension() works on stock PHP 8.5. Every defect was code/packaging. Fixes: - module.php: stop double-binding DocsSearchInterface (bindings => []) and register VecRuntime + the download commands as singletons with $packageRoot - VecRuntime: load sqlite-vec via Pdo\Sqlite::loadExtension() (not SELECT load_extension), probe extension-loading support, memoize the embedding pipeline (fixes the indexing OOM) - new docs-vec:download-extension command: pinned sqlite-vec v0.1.9, SHA-256-verified per platform (macOS/Linux/Windows × x86_64/arm64), streamed - DownloadModelCommand: stream the ONNX model to disk (no 128MB OOM) - graceful FTS5-only fallback when the extension/model/transformers are absent - composer suggest: codewithkyrian/transformers-php -> codewithkyrian/transformers (^0.5 || ^0.6); fix HF model layout + transformers pipeline usage - sanitize the FTS half's NL queries (parity with docs-fts #127), co-located because marko/docs is contract-only - downloaded binary/model are gitignored; docs updated (drivers, package page, README) Verified: real hybrid build over the docs corpus returns all 8 expected docs in top-3 (V = 8/8), matching docs-fts. 89 passed / 6 skipped / 0 failed across docs, docs-fts, docs-vec. docs-fts remains the recommended zero-infra default. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
docs-vecaudit (#128) flagged six defects plus a presumed structural wall — "PDO can't load SQLite extensions on stock PHP." The wall was a wrong API call, not a platform limit:Pdo\Sqlite::loadExtension()works on stock Homebrew PHP 8.5. Every defect turned out to be code/packaging, and the driver is now functional end-to-end.Fixes (closes #128)
module.phpboundDocsSearchInterfacein bothbindingsandsingletons→BindingConflictException. Nowbindings => []+ singleton factory.$packageRoot—VecRuntime+ the download commands now registered as singletons with__DIR__.DownloadModelCommandstreams the ONNX model to disk (stream_copy_to_stream).docs-vec:download-extensioncommand: pinned v0.1.9, SHA-256-verified before extraction, fail-closed, per-platform (macOS/Linux/Windows × x86_64/arm64).suggest-only transformers —codewithkyrian/transformers-php→codewithkyrian/transformers(^0.5 || ^0.6); kept optional with graceful FTS5-only fallback when extension/model/transformers are absent.VecRuntimenow loads sqlite-vec viaPdo\Sqlite::loadExtension()instead of the blockedSELECT load_extension(...), with a probe that degrades to FTS5 on builds that compile the capability out.Plus latent bugs the never-run embedding path hid: HF model layout (
onnx/subdir + tokenizer files),pipeline()usage + mean-pooling, per-call model reload (the indexing OOM), vec0 integer-PK bind. The FTS half now sanitizes NL queries (parity with #127), co-located becausemarko/docsis contract-only.Verification
docs-fts.docs,docs-fts,docs-vec. phpcs + php-cs-fixer clean.composer.jsonuntouched.docs-ftsremains the recommended zero-infra default;docs-vecis now a working opt-in semantic option.🤖 Generated with Claude Code