From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com> |
Subject: | Re: SQL:2023 JSON simplified accessor support |
Date: | 2024-11-21 22:46:50 |
Message-ID: | b24dc997-d78f-47bd-b323-14e68232d127@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2024-11-21 Th 3:52 PM, Alexandra Wang wrote:
> Hi,
>
> On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com> wrote:
>> Hi, hackers.
>>
>> I have implemented dot notation for jsonb using type subscripting back
>> in April 2023, but failed post it because I left Postgres Professional
>> company soon after and have not worked anywhere since, not even had
>> any interest in programming.
>>
>> But yesterday I accidentally decided to look what is going on at
>> commitfests and found this thread. I immediately started to rebase
>> code from PG16, fixed some bugs, and now I'm ready to present my
>> version of the patches which is much more complex.
>>
>> Unfortunately, I probably won't be able to devote that much time to
>> the patches as before.
> Thank you so much, Nikita, for revisiting this topic and sharing your
> v6 patches!
>
> Now that we have two solutions, I’d like to summarize our current
> options.
>
> In Postgres, there are currently three ways to access json/jsonb
> object fields and array elements:
>
> 1. '->' operator (Postgres-specific, predates SQL standard):
>
> postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd'
> -> 0; -- returns 1
>
> 2. jsonb subscripting (not available for the plain json type):
>
> postgres=# select ('{"a": 1, "b": "c", "d": [1, 2,
> 3]}'::jsonb)['d'][0]; --returns 1
>
> 3. json_query() function:
>
> postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2,
> 3]}', 'lax $.d[0]'); --returns 1
>
> A few weeks ago, I did the following performance benchmarking of the
> three approaches:
>
> -- setup:
> create table tbl(id int, col1 jsonb);
> insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2,
> {"j":"vj"}]}]}' from generate_series(1, 100000)i;
>
> -- jsonb_operator.sql
> SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl;
>
> -- jsonb_subscripting.sql
> SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl;
>
> -- jsonb_path_query.sql
> SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl;
>
> # pgbench on my local MacOS machine, using -O3 optimization:
> pgbench -n -f XXX.sql postgres -T100
>
> Results (Latency | tps):
>
> "->" operator: 14ms | 68
> jsonb subscripting: 17ms | 58
> jsonb_path_query() function: 23ms | 43
>
> So performance from best to worst:
> "->" operator > jsonb subscripting >> jsonb_path_query() function.
>
> I’m excited to see your implementation of dot notation for jsonb using
> type subscripting! This approach rounds out the three possible ways to
> implement JSON simplified accessors:
>
> ## v1: json_query() implementation
>
> Pros:
> - Fully adheres to the SQL standard.
>
> According to the SQL standard, if the JSON simplified accessor <JA> is
> not a JSON item method, it is equivalent to a <JSON query>:
>
> JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON
> EMPTY NULL ON ERROR)
>
> (I’m skipping <JA> that includes a JSON item method, as it is
> currently outside the scope of both sets of patches.)
>
> - Easiest to implement
>
> Cons:
> - Slow due to function call overhead.
>
> ## v2-v5: "->" operator implementation
>
> We initially chose this approach for its performance benefits.
> However, while addressing Peter’s feedback on v5, I encountered the
> following issue:
>
> -- setup
> create table test_json_dot(id serial primary key, test_json json);
> insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2,
> "b": {"c": 42}}]');
>
> -- problematic query:
> test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH
> CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from
> test_json_dot;
> id | b | expected
> ----+---+-----------------
> 5 | | [42, {"c": 42}]
> (1 row)
>
> This issue arises from the semantic differences between the "->"
> operator and json_query’s "lax" mode. One possible workaround is to
> redefine the "->" operator and modify its implementation. However, since
> the "->" operator has been in use for a long time, such changes would
> break backward compatibility.
>
> ## v6: jsonb subscription implementation
>
> Nikita's patches pass all my functional test cases, including those
> that failed with the previous approach.
>
> Supported formats:
> - JSON member accessor
> - JSON wildcard member accessor (Not available in v5, so this is also a plus)
> - JSON array accessor
>
> Questions:
>
> 1. Since Nikita’s patches did not address the JSON data type, and JSON
> currently does not support subscripting, should we limit the initial
> feature set to JSONB dot-notation for now? In other words, if we aim
> to fully support JSON simplified accessors for the plain JSON type,
> should we handle support for plain JSON subscripting as a follow-up
> effort?
>
> 2. I have yet to have a more thorough review of Nikita’s patches.
> One area I am not familiar with is the hstore-related changes. How
> relevant is hstore to the JSON simplified accessor?
>
We can't change the way the "->" operator works, as there could well be
uses of it in the field that rely on its current behaviour. But maybe we
could invent a new operator which is compliant with the standard
semantics for dot access, and call that. Then we'd get the best
performance, and also we might be able to implement it for the plain
JSON type. If that proves not possible we can think about not
implementing for plain JSON, but I'd rather not go there until we have to.
I don't think we should be including hstore changes here - we should
just be aiming at implementing the standard for JSON access. hstore
changes if any should be a separate feature. The aren't relevant to JSON
access, although they might use some of the same infrastructure,
depending on implementation.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tomas Vondra | 2024-11-21 23:05:40 | Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly |
Previous Message | Melanie Plageman | 2024-11-21 22:43:40 | Re: Count and log pages set all-frozen by vacuum |