From: | Nikita Glukhov <glukhov(dot)n(dot)a(at)gmail(dot)com> |
---|---|
To: | Alexandra Wang <alexandra(dot)wang(dot)oss(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com> |
Subject: | Re: SQL:2023 JSON simplified accessor support |
Date: | 2024-11-20 00:06:16 |
Message-ID: | adb15d59-2e6b-4d2e-ac5b-7aaea41f738f@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi, hackers.
I have implemented dot notation for jsonb using type subscripting back
in April 2023, but failed post it because I left Postgres Professional
company soon after and have not worked anywhere since, not even had
any interest in programming.
But yesterday I accidentally decided to look what is going on at
commitfests and found this thread. I immediately started to rebase
code from PG16, fixed some bugs, and now I'm ready to present my
version of the patches which is much more complex.
Unfortunately, I probably won't be able to devote that much time to
the patches as before.
Description of the patches:
1. Allow transformation only of a sublist of subscripts
This gives ability to custom subscripting code to consume only the
supported prefix of the subscripts list. Remaining subscripts are
applied to the result of subscription which can be of different data
type.
Example of behavior change for hstore which always consumes only the
first subscript and returns text:
=# select (hstore 'foo=>bar')['foo']['bar'];
-ERROR: hstore allows only one subscript
+ERROR: cannot subscript type text because it does not support subscripting
2. GUC compat_field_notation disables treating of field selection as a
function application (non-standard PG feature) if the column data type
supports subscripting.
Example for function hstore_hash(hstore) (after patch #4 applied):
=# set compat_field_notation = on;
=# select (hstore 'foo=>bar,hstore_hash=>123').hstore_hash; -- function
hstore_hash
------------
-1718799972
=# select (hstore 'foo=>bar,hstore_hash=>123').foo;
foo
-----
bar
=# set compat_field_notation = off; -- default
=# select (hstore 'foo=>bar').hstore_hash; -- missing key
hstore_hash
-------------
=# select (hstore 'foo=>bar,hstore_hash=>123').hstore_hash;
hstore_hash
-------------
123
3. Enable processing of field accessors by generic subscripting code.
Field accessors are represented as String nodes in refupperexprs for
distinguishing from ordinary text subscripts which can be needed for
correct EXPLAIN.
Strings node seem to no longer be a valid expression nodes, so I
had to add quite ugly handling for them in nodeFuncs etc during rebase
to PG18.
4. Implement read-only dot notation for hstore (see example above).
5. Enable processing of .* by generic subscripting (similary to #3)
6. Export jsonPathFromParseResult(): simple refactoring for #7.
7. Implement read-only dot notation for jsonb using jsonpath.
A list of subscripts is converted to jsonpath using
JsonPathParseItem. Non-constant array accessors are replaced with
jsonpath parameters.
TODO:
- wildcard array accessor [*]
- item methods
There is some inconsistency in subscripting in jsonpath, which
supports only arrays with numeric indexes, and existing generic
subscripting for jsonb, which also supports indexing of objects by
string keys.
-- JSON_QUERY(jb, '$.a["b"]')
=# select (jsonb '{"a": {"b": 1}}').a['b'];
a
---
NULL
-- JSON_QUERY(jb, '$.a')['b'] = generic subscripting after JSON_QUERY
=# select ((jsonb '{"a": {"b": 1}}').a)['b'];
a
---
1
So, I think it would be good to enable subscripting of objects by
strings in jsonpath.
8. Extract transformColumnRefInternal(): simple refactoring for #9
9. Enable non-parenthesized column references in dot notation.
I think such patch needs a separate thread.
This feature, required by the SQL standard, is implemented by calling
transformColumnRefInternal() in a loop and passing different prefixes
of the field chain. But I'm not sure how correct this is, I simply
tried to preserve compatibility with existing name resolution rules.
For example, for A.B.C.D.E we will try the following variants of
splitting to column reference and indirections:
(A.B.C.D).E = db A, schema B, table C, column D
(A.B.C).D.E = schema A, table B, column C
(A.B).C.D.E = table A, column B
(A).B.C.D.E is not tried because by the SQL standard column reference
in dot notation should be prefixed with table name. Although it would
be very nice to write "jb.key" instead of "tab.jb.key".
There is also an inconsistency in execution when the accessor chain is
separated by parentheses.
Example:
(jb).a [0] => JSON_QUERY(jb, '$.a[0]')
((jb).a)[0] => JSON_QUERY(jb, '$.a') [0]
=# select (jsonb '[{"a": 1}, {"a": 2}]').a[0];
a
--------
[1, 2]
- jsonpath '$.a' returns two items: 1, 2
- jsonpath '[0]' returns the same items due to auto array wrapping
- items wrapped into array
=# select ((jsonb '[{"a": 1}, {"a": 2}]').a)[0];
a
---
1
- JSON_QUERY(jb, '$.a') returns array [1, 2],
- generic subscript '[0]' extracts its first element
The problem could be solved if we somehow allowed intermediate
jsonpaths to return unwrapped items sequences (like SRF).
Also I think it would be an interesting task to implement the
assignment to JSON using dot notation, but it is unclear what to do
here with .* and [*].
Attachment | Content-Type | Size |
---|---|---|
v6-0001-Allow-transformation-only-of-a-sublist-of-subscri.patch | text/x-patch | 8.4 KB |
v6-0002-Add-GUC-compat_field_notation.patch | text/x-patch | 1.8 KB |
v6-0003-Pass-field-accessors-to-generic-subscripting.patch | text/x-patch | 18.9 KB |
v6-0004-Implement-read-only-dot-notation-for-hstore.patch | text/x-patch | 4.4 KB |
v6-0005-Allow-processing-of-.-by-generic-subscripting.patch | text/x-patch | 8.8 KB |
v6-0006-Export-jsonPathFromParseResult.patch | text/x-patch | 2.4 KB |
v6-0007-Implement-read-only-dot-notation-for-jsonb-using-.patch | text/x-patch | 32.5 KB |
v6-0008-Extract-transformColumnRefInternal.patch | text/x-patch | 8.3 KB |
v6-0009-Enable-non-parenthesized-column-references-in-dot.patch | text/x-patch | 27.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Sabino Mullane | 2024-11-20 00:12:37 | Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE |
Previous Message | Michael Paquier | 2024-11-20 00:01:26 | Re: per backend I/O statistics |