Skip to content

fix(compiler): unwrap JSON-array LLM responses so concept/entity pages aren't silently dropped#89

Open
cnndabbler wants to merge 2 commits into
VectifyAI:mainfrom
cnndabbler:fix/concept-json-array
Open

fix(compiler): unwrap JSON-array LLM responses so concept/entity pages aren't silently dropped#89
cnndabbler wants to merge 2 commits into
VectifyAI:mainfrom
cnndabbler:fix/concept-json-array

Conversation

@cnndabbler

Copy link
Copy Markdown

Problem

The four page generators in compiler.py (_gen_create, _gen_update, _gen_entity_create, _gen_entity_update) do:

parsed = _parse_json(raw)
brief = parsed.get("brief", "")

When the model returns a JSON array instead of an object, _parse_json returns a list, and parsed.get(...) raises AttributeError: 'list' object has no attribute 'get'. That error is not caught by the surrounding except (json.JSONDecodeError, ValueError), so the whole coroutine dies and the page is silently dropped:

openkb.agent.compiler WARNING: Entity generation failed: 'list' object has no attribute 'get'
[WARN] N entity(ies) planned but only M written ... (AttributeError).

This is the 'list' object has no attribute 'get' failure reported in #71. It still reproduces on current main (observed across a 165-doc ingest with several models — deepseek, qwen — affecting ~20% of docs, each losing one or more pages).

Fix

Add _as_obj() next to _parse_json(): if the parsed value is a list, unwrap the first dict element (the common "model wrapped the object in a one-element array" case); otherwise raise ValueError so the callers' existing raw-body fallback applies. Wrap all four call sites as parsed = _as_obj(_parse_json(raw)).

No behavior change for the normal object case.

Tests

Adds TestAsObj covering dict passthrough, single-element unwrap, first-dict-in-list, and the list-without-dict / empty-list ValueError paths. Full suite green locally.

Relates to #71.

…ration

The 4 page generators did parsed = _parse_json(raw); parsed.get(...). When
the model returns a JSON array instead of an object, _parse_json returns a
list and .get() raises AttributeError, which is not caught by the
(JSONDecodeError, ValueError) handler -- so the whole page is dropped
('list' object has no attribute 'get'; 'N planned but only M written').

Add _as_obj() to unwrap the first dict from a list (else raise ValueError so
the existing raw-body fallback applies) and wrap all four call sites.

Relates to VectifyAI#80, VectifyAI#71.
@KylinMountain

Copy link
Copy Markdown
Collaborator

Nice, clean fix and good test coverage — _as_obj is correct and the four call sites correctly fall back via the existing except (json.JSONDecodeError, ValueError).

One thing before merge: the fix is incomplete — the same .get-on-a-list crash still lives in the summary step of compile_short_doc, which this PR does not wrap:

try:
summary_parsed = _parse_json(summary_raw)
doc_brief = summary_parsed.get("brief", "")
summary = summary_parsed.get("content", summary_raw)
except (json.JSONDecodeError, ValueError):
doc_brief = ""
summary = summary_raw

summary_parsed = _parse_json(summary_raw)
doc_brief = summary_parsed.get("brief", "")          # AttributeError when the model returns a JSON array
summary = summary_parsed.get("content", summary_raw)
except (json.JSONDecodeError, ValueError):           # does not catch AttributeError

It uses the same _parse_json, the same models, and the same _JSON_RESPONSE_FORMAT, so the array-wrapping case from #71 is just as likely here. And the blast radius is larger: the uncaught AttributeError propagates out of compile_short_doc, killing the whole short-doc compile rather than dropping a single page. Suggest applying the same _as_obj(_parse_json(summary_raw)) here.

(For reference, the plan-parse site is already safe — it guards with an isinstance(parsed, (list, dict)) check before calling .get.)

Minor / optional:

  • _as_obj silently discards all but the first dict when the list has multiple objects. Fine for the one-element wrap case, just noting it.
  • Since five sites now want "parse to a dict", a small _parse_json_obj(raw) wrapper would dedupe the call sites and naturally cover the summary path so it cant be missed again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants