Re: BUG #16828: duplicate results when using ** recursive expression in JSON path

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: shammat(at)gmx(dot)net, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #16828: duplicate results when using ** recursive expression in JSON path
Date: 2021-01-17 21:24:19
Message-ID: CAPpHfdtGcnt8p7pRi6s6qEszqQbQkDjq5pWQGRh_EDExuTW0QA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sun, Jan 17, 2021 at 10:42 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> > The following query
>
> > select jsonb_path_query_array(col, '$.**.itemName')
> > from (
> > values ('{"items": [{"itemName": "a", "items": [{"itemName":
> > "b"}]}]}'::jsonb)
> > ) as t(col)
>
> > returns ["a", "a", "b", "b"] but should return only ["a", "b"] as each
value
> > only appears once in the JSON value.
>
> jsonpath_exec.c is a muddle of undocumented code, but I think
> I found where the duplication is occurring: in executeAnyItem(),
> the first occurrence comes out while recursing down from the
> executeItemOptUnwrapTarget call at line 1419 (as of HEAD), and then
> the duplicate comes out while recursing down from the executeAnyItem
> call at line 1439. So I'd say that that logic for
> "ignoreStructuralErrors" needs a rethink. (BTW, why is half of this
> code relying on cxt->ignoreStructuralErrors while the other half
> receives a passed-down flag? That seems like a recipe for bugs,
> especially with the lack of commentary about the reason for it.)

I think that's not just a bug in the code, but a high-level problem.

** operator enumerates all the subdocuments.

# select * from jsonb_path_query('{"items": [{"itemName": "a", "items":
[{"itemName": "b"}]}]}'::jsonb, '$.**');
jsonb_path_query
--------------------------------------------------------------
{"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]}
[{"items": [{"itemName": "b"}], "itemName": "a"}]
{"items": [{"itemName": "b"}], "itemName": "a"}
[{"itemName": "b"}]
{"itemName": "b"}
"b"
"a"
(7 rows)

After that, .itemName accessor is applied. But in the lax mode this
accessor automatically unwraps the arrays.

# select subres, jsonb_path_query_array(subres, '$.itemName') from
jsonb_path_query('{"items": [{"itemName": "a", "items": [{"itemName":
"b"}]}]}'::jsonb, '$.**') subres;
subres |
jsonb_path_query_array
--------------------------------------------------------------+------------------------
{"items": [{"items": [{"itemName": "b"}], "itemName": "a"}]} | []
[{"items": [{"itemName": "b"}], "itemName": "a"}] | ["a"]
{"items": [{"itemName": "b"}], "itemName": "a"} | ["a"]
[{"itemName": "b"}] | ["b"]
{"itemName": "b"} | ["b"]
"b" | []
"a" | []
(7 rows)

So, everything works as designed, but the design is probably wrong. The
issue here is that ** is our extension of the standard, and it works
strangely in lax mode. But in strict mode everything looks OK.

# select jsonb_path_query_array('{"items": [{"itemName": "a", "items":
[{"itemName": "b"}]}]}'::jsonb, 'strict $.**.itemName');
jsonb_path_query_array
------------------------
["a", "b"]
(1 row)

Should we consider everything that comes after ** always in strict mode?

> Alexander, git blame says all this code is your fault ...

I do agree all faults in this code are mine. I do agree there are faults
in this code. But I don't think *all* this code is fault :)

I'll plan my time to improve the clarity of this code.

------
Regards,
Alexander Korotkov

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Rupert Gallagher 2021-01-17 22:51:35 Re: BUG #16823: Unreachable code
Previous Message Tom Lane 2021-01-17 19:42:01 Re: BUG #16828: duplicate results when using ** recursive expression in JSON path