Skip to content

Clamp DiskANN int8 quantization to avoid float->int8 UB#311

Open
stumpylog wants to merge 1 commit into
asg017:mainfrom
stumpylog:fix/diskann-int8-quantize-clamp
Open

Clamp DiskANN int8 quantization to avoid float->int8 UB#311
stumpylog wants to merge 1 commit into
asg017:mainfrom
stumpylog:fix/diskann-int8-quantize-clamp

Conversation

@stumpylog

Copy link
Copy Markdown

Problem

diskann_quantize_vector's VEC0_DISKANN_QUANTIZER_INT8 path casts the quantized float directly to i8:

((i8 *)out)[i] = (i8)(((src[i] - (-1.0f)) / step) - 128.0f);

When the input vector is not normalized to [-1, 1], the intermediate value falls outside [-128, 127]. Converting an out-of-range floating-point value to a signed integer type is undefined behavior in C (C11 6.3.1.4); in practice it wraps to a garbage int8 instead of saturating.

Fix

Saturate the value to [-128, 127] before the cast, so out-of-range inputs quantize to the nearest representable int8:

const f32 raw = ((src[i] - (-1.0f)) / step) - 128.0f;
const f32 q = raw < -128.0f ? -128.0f : (raw > 127.0f ? 127.0f : raw);
((i8 *)out)[i] = (i8)q;

How it was found

An AddressSanitizer + UndefinedBehaviorSanitizer run of the loadable test suite, on test_diskann_insert_int8_quantizer_knn:

sqlite-vec-diskann.c:160:26: runtime error: -242.563 is outside the range of representable values of type 'signed char'
    #0 diskann_quantize_vector  sqlite-vec-diskann.c:160:26
    #1 diskann_quantize_query   sqlite-vec-diskann.c:210:3
    #2 diskann_search           sqlite-vec-diskann.c:752:22

diskann_quantize_query is the only other caller and goes through the same function, so both the index-build and query paths are covered by the one fix. No behavior change for already-normalized inputs.

🤖 Generated with Claude Code

diskann_quantize_vector's INT8 path cast the quantized float directly to i8.
When the input vector is not normalized to [-1, 1] the intermediate value
falls outside [-128, 127], and converting an out-of-range float to a signed
integer type is undefined behavior in C (in practice it wraps to garbage).

Saturate the value to [-128, 127] before the cast so out-of-range inputs
quantize to the nearest representable int8. Surfaced by an ASan/UBSan run of
the loadable test suite (test_diskann_insert_int8_quantizer_knn).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant