fix(compiler): unwrap JSON-array LLM responses so concept/entity pages aren't silently dropped#89
fix(compiler): unwrap JSON-array LLM responses so concept/entity pages aren't silently dropped#89cnndabbler wants to merge 2 commits into
Conversation
…ration
The 4 page generators did parsed = _parse_json(raw); parsed.get(...). When
the model returns a JSON array instead of an object, _parse_json returns a
list and .get() raises AttributeError, which is not caught by the
(JSONDecodeError, ValueError) handler -- so the whole page is dropped
('list' object has no attribute 'get'; 'N planned but only M written').
Add _as_obj() to unwrap the first dict from a list (else raise ValueError so
the existing raw-body fallback applies) and wrap all four call sites.
Relates to VectifyAI#80, VectifyAI#71.
|
Nice, clean fix and good test coverage — One thing before merge: the fix is incomplete — the same OpenKB/openkb/agent/compiler.py Lines 1992 to 1999 in 1ad8189 summary_parsed = _parse_json(summary_raw)
doc_brief = summary_parsed.get("brief", "") # AttributeError when the model returns a JSON array
summary = summary_parsed.get("content", summary_raw)
except (json.JSONDecodeError, ValueError): # does not catch AttributeErrorIt uses the same (For reference, the plan-parse site is already safe — it guards with an Minor / optional:
|
Problem
The four page generators in
compiler.py(_gen_create,_gen_update,_gen_entity_create,_gen_entity_update) do:When the model returns a JSON array instead of an object,
_parse_jsonreturns alist, andparsed.get(...)raisesAttributeError: 'list' object has no attribute 'get'. That error is not caught by the surroundingexcept (json.JSONDecodeError, ValueError), so the whole coroutine dies and the page is silently dropped:This is the
'list' object has no attribute 'get'failure reported in #71. It still reproduces on currentmain(observed across a 165-doc ingest with several models — deepseek, qwen — affecting ~20% of docs, each losing one or more pages).Fix
Add
_as_obj()next to_parse_json(): if the parsed value is a list, unwrap the first dict element (the common "model wrapped the object in a one-element array" case); otherwise raiseValueErrorso the callers' existing raw-body fallback applies. Wrap all four call sites asparsed = _as_obj(_parse_json(raw)).No behavior change for the normal object case.
Tests
Adds
TestAsObjcovering dict passthrough, single-element unwrap, first-dict-in-list, and the list-without-dict / empty-listValueErrorpaths. Full suite green locally.Relates to #71.