Re: SQL:2023 JSON simplified accessor support

From: Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>
To: Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>
Subject: Re: SQL:2023 JSON simplified accessor support
Date: 2024-11-21 20:52:49
Message-ID: CAK98qZ2QGcyJrJAFv9wjY6S8yP9dUVnmG9Gb4OXuzuMMuM1Z5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Nov 19, 2024 at 6:06 PM Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com> wrote:
>
> Hi, hackers.
>
> I have implemented dot notation for jsonb using type subscripting back
> in April 2023, but failed post it because I left Postgres Professional
> company soon after and have not worked anywhere since, not even had
> any interest in programming.
>
> But yesterday I accidentally decided to look what is going on at
> commitfests and found this thread. I immediately started to rebase
> code from PG16, fixed some bugs, and now I'm ready to present my
> version of the patches which is much more complex.
>
> Unfortunately, I probably won't be able to devote that much time to
> the patches as before.

Thank you so much, Nikita, for revisiting this topic and sharing your
v6 patches!

Now that we have two solutions, I’d like to summarize our current
options.

In Postgres, there are currently three ways to access json/jsonb
object fields and array elements:

1. '->' operator (Postgres-specific, predates SQL standard):

postgres=# select ('{"a": 1, "b": "c", "d": [1, 2, 3]}'::json) -> 'd'
-> 0; -- returns 1

2. jsonb subscripting (not available for the plain json type):

postgres=# select ('{"a": 1, "b": "c", "d": [1, 2,
3]}'::jsonb)['d'][0]; --returns 1

3. json_query() function:

postgres=# select json_query(jsonb '{"a": 1, "b": "c", "d": [1, 2,
3]}', 'lax $.d[0]'); --returns 1

A few weeks ago, I did the following performance benchmarking of the
three approaches:

-- setup:
create table tbl(id int, col1 jsonb);
insert into tbl select i, '{"x":"vx", "y":[{"a":[1,2,3]}, {"b":[1, 2,
{"j":"vj"}]}]}' from generate_series(1, 100000)i;

-- jsonb_operator.sql
SELECT id, col1 -> 'y' -> 1 -> 'b' -> 2 -> 'j' AS jsonb_operator FROM tbl;

-- jsonb_subscripting.sql
SELECT id, col1['y'][1]['b'][2]['j'] AS jsonb_subscript FROM tbl;

-- jsonb_path_query.sql
SELECT id, jsonb_path_query(col1, '$.y[1].b[2].j') FROM tbl;

# pgbench on my local MacOS machine, using -O3 optimization:
pgbench -n -f XXX.sql postgres -T100

Results (Latency | tps):

"->" operator: 14ms | 68
jsonb subscripting: 17ms | 58
jsonb_path_query() function: 23ms | 43

So performance from best to worst:
"->" operator > jsonb subscripting >> jsonb_path_query() function.

I’m excited to see your implementation of dot notation for jsonb using
type subscripting! This approach rounds out the three possible ways to
implement JSON simplified accessors:

## v1: json_query() implementation

Pros:
- Fully adheres to the SQL standard.

According to the SQL standard, if the JSON simplified accessor <JA> is
not a JSON item method, it is equivalent to a <JSON query>:

JSON_QUERY ( VEP, 'lax $.JC' WITH CONDITIONAL ARRAY WRAPPER NULL ON
EMPTY NULL ON ERROR)

(I’m skipping <JA> that includes a JSON item method, as it is
currently outside the scope of both sets of patches.)

- Easiest to implement

Cons:
- Slow due to function call overhead.

## v2-v5: "->" operator implementation

We initially chose this approach for its performance benefits.
However, while addressing Peter’s feedback on v5, I encountered the
following issue:

-- setup
create table test_json_dot(id serial primary key, test_json json);
insert into test_json_dot values (5, '[{"a": 1, "b": 42}, {"a": 2,
"b": {"c": 42}}]');

-- problematic query:
test1=# select id, (test_json).b, json_query(test_json, 'lax $.b' WITH
CONDITIONAL WRAPPER NULL ON EMPTY NULL ON ERROR) as expected from
test_json_dot;
id | b | expected
----+---+-----------------
5 | | [42, {"c": 42}]
(1 row)

This issue arises from the semantic differences between the "->"
operator and json_query’s "lax" mode. One possible workaround is to
redefine the "->" operator and modify its implementation. However, since
the "->" operator has been in use for a long time, such changes would
break backward compatibility.

## v6: jsonb subscription implementation

Nikita's patches pass all my functional test cases, including those
that failed with the previous approach.

Supported formats:
- JSON member accessor
- JSON wildcard member accessor (Not available in v5, so this is also a plus)
- JSON array accessor

Questions:

1. Since Nikita’s patches did not address the JSON data type, and JSON
currently does not support subscripting, should we limit the initial
feature set to JSONB dot-notation for now? In other words, if we aim
to fully support JSON simplified accessors for the plain JSON type,
should we handle support for plain JSON subscripting as a follow-up
effort?

2. I have yet to have a more thorough review of Nikita’s patches.
One area I am not familiar with is the hstore-related changes. How
relevant is hstore to the JSON simplified accessor?

Best,
Alex

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey M. Borodin 2024-11-21 20:52:52 Re: Forbid to DROP temp tables of other sessions
Previous Message Jeff Davis 2024-11-21 20:50:13 Re: Update Unicode data to Unicode 16.0.0