Add negative constraint parsing to lexical search#2428
Conversation
🎩 PreviewA preview build has been created at: |
97e37c0 to
4f20ff2
Compare
0a7d588 to
e379e64
Compare
4f20ff2 to
638c7b7
Compare
e379e64 to
89029f0
Compare
638c7b7 to
3f91762
Compare
89029f0 to
fc80727
Compare
790c426 to
554c927
Compare
🤖 Code review — Add negative constraint parsing to lexical searchNicely thought-through. The two design decisions that matter most are both correct and well-tested: negation is literal (negated terms are not synonym-expanded, so "not gcs" doesn't also drop storage/bucket components), and exclusion is a whole-token match rather than substring (a short negated token can't knock a component out by appearing inside an unrelated word). The conjunction-barrier in the capture group ( The main thing to weigh:
Regex looks safe from catastrophic backtracking (word groups separated by required |
fc80727 to
55e7004
Compare
554c927 to
03f38a0
Compare
55e7004 to
3422035
Compare
0b4da9b to
8365c6b
Compare
3422035 to
80f3e75
Compare
8365c6b to
c28d286
Compare
80f3e75 to
81a12de
Compare
81a12de to
628e387
Compare
c28d286 to
3b7ef74
Compare

Description
Adds support for negative constraints in lexical search queries. When a user includes phrases like "not GCS", "excluding GCS", or "without GCS" in their search, components matching those excluded terms are filtered out of the results. This allows users to express intent more naturally, such as "I want to upload a file but not to GCS", and receive only the relevant components.
This is implemented by parsing the query text before tokenization, extracting negative constraint phrases using a regex pattern, and scoring any index entry that matches a negative token as zero. The word
"but"has also been added to the stop words list to avoid it interfering with scoring.Related Issue and Pull requests
Type of Change
Checklist
Screenshots (if applicable)
Test Instructions
"upload a file but not to GCS"or"upload a file excluding GCS".Additional Comments
The negative constraint pattern currently recognises the trigger words
without,excluding,exclude,not, andno, optionally followed by prepositions liketo,use, orusing.