Add unified parse_json runtime type checker (#3285) [draft / proof-of-direction]#4063
Draft
vup903 wants to merge 4 commits into
Draft
Add unified parse_json runtime type checker (#3285) [draft / proof-of-direction]#4063vup903 wants to merge 4 commits into
vup903 wants to merge 4 commits into
Conversation
Introduce zarr.core.json_parse.parse_json, a single type-annotation-driven validator that consolidates the scattered per-field parse_* helpers. Handles primitives, Literal, unions/Optional, fixed and variadic tuples, Sequence/list (coerced to tuple), Mapping/dict, and TypedDict, with a bool-vs-int safe primitive check. Adds tests/test_json_parse.py (94 tests) and a changelog fragment. No existing call sites migrated yet; this is the proof-of-direction module. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r-developers#3285) Pilot migration delegating three representative helpers to the unified parse_json validator. Public signatures and return types are unchanged. parse_zarr_format re-wraps parse_json's ValueError/TypeError as MetadataValidationError with the original message to preserve observable behavior. parse_json is imported function-locally to avoid circular imports. Focused suites green: test_common/test_config/test_metadata/test_json_parse = 430 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ns (zarr-developers#3285) Apply ruff check/format to satisfy the Lint CI hook. Keep typing.Union/Optional spellings in tests (with noqa) to cover that origin path alongside X | Y. Rework _parse_typeddict to derive required/optional from get_type_hints(include_extras=True) + __total__ instead of __required_keys__, so class-syntax NotRequired is detected even when 'from __future__ import annotations' stringizes the hints (as in zarr's metadata modules). test_json_parse + test_common + test_metadata = 393 passed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lopers#3285) get_origin(hint) is Required/NotRequired tripped mypy's comparison-overlap and unreachable checks; typing origin as Any keeps the runtime check while satisfying mypy. Full 'uv run --frozen mypy' is clean (190 files); ruff check/format clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is a draft / proof-of-direction PR for #3285, opened to show the intended approach and get early feedback before migrating all call sites — per @d-v-b's request in the issue thread.
It introduces a single unified runtime type checker,
parse_json(value, type_annotation), that consolidates the scattered per-fieldparse_*helpers (there are currently ~42 of them acrosssrc/zarr). The function validates a JSON-decoded value against a type annotation and returns data assignable to that annotation (coercing where sensible, e.g.parse_json([1, 2, 3], tuple[int, int, int]) -> (1, 2, 3)), or raises a clear error.What's in this draft
src/zarr/core/json_parse.py—parse_jsonplus sub-parsers, scoped to the JSON-relevant categories named in the issue:None,str,int,float,bool(with a bool-vs-int-safe check:Trueis not accepted forint, and1is not accepted forbool)Literal[...]OptionaltupleSequence[T]/list[T]→ returned as a tupleMapping[str, T]/dict[str, T]TypedDicttests/test_json_parse.py— 94 tests covering every category, incl. the bool/int edge cases, coercion, unions, nested TypedDict.parse_json, with public signatures and behavior unchanged:core/common.py::parse_order→parse_json(data, Literal["C", "F"])core/common.py::parse_bool→parse_json(data, bool)core/metadata/v3.py::parse_zarr_format→parse_json(data, Literal[3]), re-wrapping the error asMetadataValidationErrorto preserve the original exception type and message.Focused suites pass locally (Python 3.12,
test.py3.12-optionalenv):test_common,test_config,test_metadata, andtest_json_parse→ 430 passed.Open questions for maintainers (direction feedback)
src/zarr/core/json_parse.pyis a placeholder; happy to move it (e.g. into a typing/metadata utility module).parse_jsonraises plainValueError/TypeErrorsince it's field-agnostic; field-specific callers (likeparse_zarr_format) re-wrap into their domain error. Does that split match what you'd want?NotRequired— the current_parse_typeddictreads__required_keys__/__optional_keys__directly. Underfrom __future__ import annotationswith class-syntax TypedDicts,typing_extensionscan't see theNotRequiredwrapper at class-creation time, so an optional key can be mis-classified as required. The robust fix is to resolve hints viatyping_extensions.get_type_hints(..., include_extras=True)and inspect forRequired/NotRequired. I've left this for the follow-up once the overall direction is confirmed, since none of the three pilot helpers are TypedDicts.If the direction looks right, I'll migrate the remaining
parse_*call sites incrementally in follow-up commits.Closes #3285 once fully migrated (draft for now).