From: | Alexander Korotkov <aekorotkov(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | shammat(at)gmx(dot)net, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: BUG #16828: duplicate results when using ** recursive expression in JSON path |
Date: | 2021-01-17 21:24:19 |
Message-ID: | CAPpHfdtGcnt8p7pRi6s6qEszqQbQkDjq5pWQGRh_EDExuTW0QA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Sun, Jan 17, 2021 at 10:42 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> > The following query
>
> > select jsonb_path_query_array(col, '$.**.itemName')
> > from (
> > values ('{"items": [{"itemName": "a", "items": [{"itemName":
> > "b"}]}]}'::jsonb)
> > ) as t(col)
>
> > returns ["a", "a", "b", "b"] but should return only ["a", "b"] as each
value
> > only appears once in the JSON value.
>
> jsonpath_exec.c is a muddle of undocumented code, but I think
> I found where the duplication is occurring: in executeAnyItem(),
> the first occurrence comes out while recursing down from the
> executeItemOptUnwrapTarget call at line 1419 (as of HEAD), and then
> the duplicate comes out while recursing down from the executeAnyItem
> call at line 1439. So I'd say that that logic for
> "ignoreStructuralErrors" needs a rethink. (BTW, why is half of this
> code relying on cxt->ignoreStructuralErrors while the other half
> receives a passed-down flag? That seems like a recipe for bugs,
> especially with the lack of commentary about the reason for it.)
I think that's not just a bug in the code, but a high-level problem.
** operator enumerates all the subdocuments.
# select * from jsonb_path_query('{"items": [{"itemName": "a", "items":
[{"itemName": "b"}]}]}'::jsonb, '$.**');
jsonb_path_query
--------------------------------------------------------------
{"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]}
[{"items": [{"itemName": "b"}], "itemName": "a"}]
{"items": [{"itemName": "b"}], "itemName": "a"}
[{"itemName": "b"}]
{"itemName": "b"}
"b"
"a"
(7 rows)
After that, .itemName accessor is applied. But in the lax mode this
accessor automatically unwraps the arrays.
# select subres, jsonb_path_query_array(subres, '$.itemName') from
jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName":
"b"}]}]}'::jsonb, '$.**') subres;
subres |
jsonb_path_query_array
--------------------------------------------------------------+------------------------
{"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]} | []
[{"items": [{"itemName": "b"}], "itemName": "a"}] | ["a"]
{"items": [{"itemName": "b"}], "itemName": "a"} | ["a"]
[{"itemName": "b"}] | ["b"]
{"itemName": "b"} | ["b"]
"b" | []
"a" | []
(7 rows)
So, everything works as designed, but the design is probably wrong. The
issue here is that ** is our extension of the standard, and it works
strangely in lax mode. But in strict mode everything looks OK.
# select jsonb_path_query_array('{"items": [{"itemName": "a", "items":
[{"itemName": "b"}]}]}'::jsonb, 'strict $.**.itemName');
jsonb_path_query_array
------------------------
["a", "b"]
(1 row)
Should we consider everything that comes after ** always in strict mode?
> Alexander, git blame says all this code is your fault ...
I do agree all faults in this code are mine. I do agree there are faults
in this code. But I don't think *all* this code is fault :)
I'll plan my time to improve the clarity of this code.
------
Regards,
Alexander Korotkov
From | Date | Subject | |
---|---|---|---|
Next Message | Rupert Gallagher | 2021-01-17 22:51:35 | Re: BUG #16823: Unreachable code |
Previous Message | Tom Lane | 2021-01-17 19:42:01 | Re: BUG #16828: duplicate results when using ** recursive expression in JSON path |