Skip to content

MDEV-39999 Fix JSON path quoting for keys with special characters#5240

Open
akshatnehra wants to merge 1 commit into
MariaDB:mainfrom
akshatnehra:MDEV-39999
Open

MDEV-39999 Fix JSON path quoting for keys with special characters#5240
akshatnehra wants to merge 1 commit into
MariaDB:mainfrom
akshatnehra:MDEV-39999

Conversation

@akshatnehra

Copy link
Copy Markdown
Contributor

Description

This PR fixes MDEV-39999 where JSON_SEARCH() and other functions using append_json_path() return unquoted paths for keys containing special characters, producing paths that cannot be parsed back.

For example, a key "a.b" is emitted as $.a.b which the JSON path parser interprets as key "a" followed by key "b" — making it impossible to round-trip JSON_SEARCHJSON_VALUE.

The fix involves:

  1. Detecting keys containing path-ambiguous characters (., [, ], ", space, *, \) in append_json_path()
  2. Emitting such keys as quoted identifiers (e.g., $."a.b") with proper escaping of embedded double-quotes and backslashes

How can this PR be tested?

Run the MTR test:

build/mysql-test/mtr main.mdev_39999

Or verify the round-trip works:

SELECT JSON_VALUE('{"a.b":"found"}', JSON_UNQUOTE(JSON_SEARCH('{"a.b":"found"}', 'one', 'found'))) AS val;
-- Should return: found

Results from my testing

  1. Before changes
MariaDB> SELECT JSON_SEARCH('{"a.b":"x"}', 'one', 'x') AS path;
+--------+
| path   |
+--------+
| "$.a.b"|
+--------+

MariaDB> SELECT JSON_SEARCH('{"a[0]":"x"}', 'one', 'x') AS path;
+----------+
| path     |
+----------+
| "$.a[0]" |
+----------+

MariaDB> SELECT JSON_SEARCH('{"hello world":"x"}', 'one', 'x') AS path;
+----------------+
| path           |
+----------------+
| "$.hello world"|
+----------------+

These paths are unparseable — $.a.b means key a then key b, not key a.b.

  1. After changes
MariaDB> SELECT JSON_SEARCH('{"a.b":"x"}', 'one', 'x') AS path;
+--------------+
| path         |
+--------------+
| "$.\"a.b\""  |
+--------------+

MariaDB> SELECT JSON_SEARCH('{"a[0]":"x"}', 'one', 'x') AS path;
+----------------+
| path           |
+----------------+
| "$.\"a[0]\""   |
+----------------+

MariaDB> SELECT JSON_SEARCH('{"hello world":"x"}', 'one', 'x') AS path;
+----------------------+
| path                 |
+----------------------+
| "$.\"hello world\""  |
+----------------------+

Paths are now properly quoted and can be used with JSON_VALUE/JSON_EXTRACT to access the original key.

Basing the PR against the correct MariaDB version

  • This is a bug fix and the PR is based against the latest MariaDB development branch.

PR quality check

  • I have checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am fine with the reviewer making the changes themselves.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses MDEV-39999 by quoting keys containing special characters in JSON paths generated by JSON_SEARCH. The reviewer feedback points out that the blacklist-based approach used to detect keys requiring quoting is prone to missing edge cases, such as empty keys, keys starting with a digit, or other special characters like hyphens. It is recommended to adopt a whitelist-based approach to ensure only safe alphanumeric identifiers starting with a letter or underscore are left unquoted.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread sql/item_jsonfunc.cc Outdated
Comment on lines +4383 to +4392
bool needs_quote= false;
for (size_t i= 0; i < k_len; i++)
{
if (k[i] == '.' || k[i] == '[' || k[i] == ']' ||
k[i] == '"' || k[i] == ' ' || k[i] == '*' || k[i] == '\\')
{
needs_quote= true;
break;
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current blacklist-based approach for detecting keys that need quoting is prone to missing several edge cases:

  1. Empty keys (k_len == 0): Currently, an empty key is not quoted, resulting in the invalid JSON path $. (as seen in the test results: "$."). It should be quoted as $."".
  2. Keys starting with a digit: Keys starting with a digit (e.g., 1abc) are not quoted, which is invalid in standard JSON path syntax.
  3. Other special characters: Characters like hyphens (-), slashes (/), or @ are not in the blacklist and will be left unquoted, producing invalid paths (e.g., $.a-b).

Using a whitelist approach (only allowing safe alphanumeric/underscore identifiers starting with a letter or underscore) is much more robust and correctly handles all these cases.

      bool needs_quote= (k_len == 0);
      if (!needs_quote)
      {
        char first= k[0];
        if (!((first >= 'a' && first <= 'z') ||
              (first >= 'A' && first <= 'Z') ||
              first == '_'))
        {
          needs_quote= true;
        }
        else
        {
          for (size_t i= 1; i < k_len; i++)
          {
            char ch= k[i];
            if (!((ch >= 'a' && ch <= 'z') ||
                  (ch >= 'A' && ch <= 'Z') ||
                  (ch >= '0' && ch <= '9') ||
                  ch == '_'))
            {
              needs_quote= true;
              break;
            }
          }
        }
      }

append_json_path() emits keys containing path-ambiguous characters
(dot, brackets, quotes, space, asterisk, backslash) without quoting,
producing paths that cannot be parsed back correctly. For example,
a key "a.b" is emitted as $.a.b which is interpreted as key "a"
then key "b".

Fix: detect keys containing special characters and emit them as
quoted identifiers (e.g. $."a.b"). Backslash and double-quote
inside keys are escaped with a preceding backslash.

All new code of the whole pull request, including one or several files
that are either new files or modified ones, are contributed under the
BSD-new license. I am contributing on behalf of my employer Amazon Web
Services, Inc.
@gkodinov gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Jun 16, 2026
@gkodinov gkodinov self-assigned this Jun 16, 2026

@gkodinov gkodinov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! This is a preliminary review.

Please consider fixing the following (in addition to the notes below):

  • rebase on 10.11: this is a bug fix. It needs to go to the lowest affected GA version
  • fix the buildbot failures. I can currently see just one:
main.func_json                           w14 [ fail ]
        Test ended at 2026-06-15 19:52:27
CURRENT_TEST: main.func_json
--- /home/buildbot/aarch64-debian-11/build/mysql-test/main/func_json.result	2026-06-15 19:33:34.000000000 +0000
+++ /home/buildbot/aarch64-debian-11/build/mysql-test/main/func_json.reject	2026-06-15 19:52:27.534594656 +0000
@@ -596,7 +596,7 @@
 []
 SELECT JSON_search( '{"": "a"}', "one", 'a');
 JSON_search( '{"": "a"}', "one", 'a')
-"$."
+"$.\"\""
 select json_merge('{"a":"b"}', '{"a":"c"}') as ex ;
 ex
 {"a": ["b", "c"]}
Result length mismatch
 - saving '/home/buildbot/aarch64-debian-11/build/mysql-test/var/14/log/main.func_json/' to '/home/buildbot/aarch64-debian-11/build/mysql-test/var/log/main.func_json/'
Retrying test main.func_json, attempt(2/3)...

Comment thread sql/item_jsonfunc.cc
return TRUE;
if (needs_quote)
{
if (str->append('\\') || str->append('"'))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please consider using st_append_escaped() instead. See append_json_value_from_field(). And produce a "bracketed-selection" (see above).

Comment thread sql/item_jsonfunc.cc
bool needs_quote= (k_len == 0);
for (size_t i= 0; i < k_len; i++)
{
if (k[i] == '.' || k[i] == '[' || k[i] == ']' ||

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's more complex than this I'm afraid. https://www.rfc-editor.org/info/rfc9535/#segments-details defines what is a valid "shorthand" notation as follows:

child-segment       = bracketed-selection /
                      ("."
                       (wildcard-selector /
                        member-name-shorthand))

bracketed-selection = "[" S selector *(S "," S selector) S "]"

member-name-shorthand = name-first *name-char
name-first          = ALPHA /
                      "_"   /
                      %x80-D7FF /
                         ; skip surrogate code points
                      %xE000-10FFFF
name-char           = name-first / DIGIT

DIGIT               = %x30-39              ; 0-9
ALPHA               = %x41-5A / %x61-7A    ; A-Z / a-z

This is implemented by json_escape() in the MariaDB code I believe.

So here's what I would suggest: either always produce bracketed-selection (see above) or call st_append_escaped (or json_escape directly), see if the result is different from the source string and if it is, go with the bracketed expression that you have, otherwise do the unescaped source string.

This will have the disadvantage that you will always produce a bracketed escaped string. But I believe this is not such a big deal performance wise.

You could of course consider implementing a non-outputting version of json_escape() and call that json_needs_escaping() for example. But I believe it needs to be done along the same lines at least.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

2 participants