Re: SQL/JSON query functions context_item doc entry and type requirement

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: jian he <jian(dot)universality(at)gmail(dot)com>, Chapman Flack <jcflack(at)acm(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: SQL/JSON query functions context_item doc entry and type requirement
Date: 2024-06-21 12:18:15
Message-ID: CA+HiwqHgZcmBYXwMaKNPf2RUKaGVcS7JaOM2YeORqWShUA43oQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 21, 2024 at 9:47 AM David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
> On Thu, Jun 20, 2024 at 9:01 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>>
>> On Thu, Jun 20, 2024 at 5:46 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>> >
>> > On Thu, Jun 20, 2024 at 1:03 AM David G. Johnston
>> > <david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>> > > On Wed, Jun 19, 2024 at 8:29 AM jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>> > >>
>> > >> On Mon, Jun 17, 2024 at 9:05 PM Chapman Flack <jcflack(at)acm(dot)org> wrote:
>> > >> >
>> > >> > Hi,
>> > >> >
>> > >> > On 06/17/24 02:43, Amit Langote wrote:
>> > >> > > <replaceable>context_item</replaceable> expression can be a value of
>> > >> > > any type that can be cast to <type>jsonb</type>. This includes types
>> > >> > > such as <type>char</type>, <type>text</type>, <type>bpchar</type>,
>> > >> > > <type>character varying</type>, and <type>bytea</type> (with
>> > >> > > <code>ENCODING UTF8</code>), as well as any domains over these types.
>> > >> >
>> > >> > Reading this message in conjunction with [0] makes me think that we are
>> > >> > really talking about a function that takes a first parameter of type jsonb,
>> > >> > and behaves exactly that way (so any cast required is applied by the system
>> > >> > ahead of the call). Under those conditions, this seems like an unusual
>> > >> > sentence to add in the docs, at least until we have also documented that
>> > >> > tan's argument can be of any type that can be cast to double precision.
>> > >> >
>> > >>
>> > >> I guess it would be fine to add an unusual sentence to the docs.
>> > >>
>> > >> imagine a function: array_avg(anyarray) returns anyelement.
>> > >> array_avg calculate an array's elements's avg. like
>> > >> array('{1,2,3}'::int[]) returns 2.
>> > >> but array_avg won't make sense if the input argument is a date array.
>> > >> so mentioning in the doc: array_avg can accept anyarray, but anyarray
>> > >> cannot date array.
>> > >> seems ok.
>> > >
>> > >
>> > > There is existing wording for this:
>> > >
>> > > "The expression can be of any JSON type, any character string type, or bytea in UTF8 encoding."
>> > >
>> > > If you add this sentence to the paragraph the link that already exists, which simply points the reader to this sentence, becomes redundant and should be removed.
>> >
>> > I've just posted a patch in the other thread [1] to restrict
>> > context_item to be of jsonb type, which users would need to ensure by
>> > adding an explicit cast if needed. I think that makes this
>> > clarification unnecessary.
>> >
>> > > As for table 9.16.3 - it is unwieldy already. Lets try and make the core syntax shorter, not longer. We already have precedence in the subsequent json_table section - give each major clause item a name then below the table define the syntax and meaning for those names. Unlike in that section - which probably should be modified too - context_item should have its own description line.
>> >
>> > I had posted a patch a little while ago at [1] to render the syntax a
>> > bit differently with each function getting its own syntax synopsis.
>> > Resending it here; have addressed Jian He's comments.
>> >
>> > --
>
>
> I was thinking more like:
>
> diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
> index c324906b22..b9d157663a 100644
> --- a/doc/src/sgml/func.sgml
> +++ b/doc/src/sgml/func.sgml
> @@ -18692,8 +18692,10 @@ $.* ? (@ like_regex "^\\d+$")
> <entry role="func_table_entry"><para role="func_signature">
> <indexterm><primary>json_exists</primary></indexterm>
> <function>json_exists</function> (
> - <replaceable>context_item</replaceable>, <replaceable>path_expression</replaceable> <optional> <literal>PASSING</literal> { <replaceable>value</replaceable> <literal>AS</literal> <replaceable>varname</replaceable> } <optional>, ...</optional></optional>
> - <optional> { <literal>TRUE</literal> | <literal>FALSE</literal> |<literal> UNKNOWN</literal> | <literal>ERROR</literal> } <literal>ON ERROR</literal> </optional>)
> + <replaceable>context_item</replaceable>,
> + <replaceable>path_expression</replaceable>
> + <optional>variable_definitions</optional>
> + <optional>on_error_boolean</optional>)
> </para>
> <para>
> Returns true if the SQL/JSON <replaceable>path_expression</replaceable>
> @@ -18732,12 +18734,14 @@ ERROR: jsonpath array subscript is out of bounds
> <entry role="func_table_entry"><para role="func_signature">
> <indexterm><primary>json_query</primary></indexterm>
> <function>json_query</function> (
> - <replaceable>context_item</replaceable>, <replaceable>path_expression</replaceable> <optional> <literal>PASSING</literal> { <replaceable>value</replaceable> <literal>AS</literal> <replaceable>varname</replaceable> } <optional>, ...</optional></optional>
> - <optional> <literal>RETURNING</literal> <replaceable>data_type</replaceable> <optional> <literal>FORMAT JSON</literal> <optional> <literal>ENCODING UTF8</literal> </optional> </optional> </optional>
> - <optional> { <literal>WITHOUT</literal> | <literal>WITH</literal> { <literal>CONDITIONAL</literal> | <optional><literal>UNCONDITIONAL</literal></optional> } } <optional> <literal>ARRAY</literal> </optional> <literal>WRAPPER</literal> </optional>
> - <optional> { <literal>KEEP</literal> | <literal>OMIT</literal> } <literal>QUOTES</literal> <optional> <literal>ON SCALAR STRING</literal> </optional> </optional>
> - <optional> { <literal>ERROR</literal> | <literal>NULL</literal> | <literal>EMPTY</literal> { <optional> <literal>ARRAY</literal> </optional> | <literal>OBJECT</literal> } | <literal>DEFAULT</literal> <replaceable>expression</replaceable> } <literal>ON EMPTY</literal> </optional>
> - <optional> { <literal>ERROR</literal> | <literal>NULL</literal> | <literal>EMPTY</literal> { <optional> <literal>ARRAY</literal> </optional> | <literal>OBJECT</literal> } | <literal>DEFAULT</literal> <replaceable>expression</replaceable> } <literal>ON ERROR</literal> </optional>)
> + <replaceable>context_item</replaceable>,
> + <replaceable>path_expression</replaceable>
> + <optional>variable_definitions</optional>
> + <optional>return_clause</optional>
> + <optional>wrapping_clause</optional>
> + <optional>quoting_clause</optional>
> + <optional>on_empty_set</optional>
> + <optional>on_error_set</optional>)
> </para>
> <para>
> Returns the result of applying the SQL/JSON
> @@ -18809,11 +18813,12 @@ DETAIL: Missing "]" after array dimensions.
> <entry role="func_table_entry"><para role="func_signature">
> <indexterm><primary>json_value</primary></indexterm>
> <function>json_value</function> (
> - <replaceable>context_item</replaceable>, <replaceable>path_expression</replaceable>
> - <optional> <literal>PASSING</literal> { <replaceable>value</replaceable> <literal>AS</literal> <replaceable>varname</replaceable> } <optional>, ...</optional></optional>
> - <optional> <literal>RETURNING</literal> <replaceable>data_type</replaceable> </optional>
> - <optional> { <literal>ERROR</literal> | <literal>NULL</literal> | <literal>DEFAULT</literal> <replaceable>expression</replaceable> } <literal>ON EMPTY</literal> </optional>
> - <optional> { <literal>ERROR</literal> | <literal>NULL</literal> | <literal>DEFAULT</literal> <replaceable>expression</replaceable> } <literal>ON ERROR</literal> </optional>)
> + <replaceable>context_item</replaceable>,
> + <replaceable>path_expression</replaceable>
> + <optional>variable_definitions</optional>
> + <optional>return_type</optional>
> + <optional>on_empty_value</optional>
> + <optional>on_error_value</optional>)
> </para>
> <para>
> Returns the result of applying the SQL/JSON
>
> Then defining each of those below the table - keeping the on_error variants together.

That sounds appealing. I'll try to come up with a patch unless you or
anyone else wants to take a stab at it.

>> playing around with it.
>> found some minor issues:
>>
>> json_exists allow: DEFAULT expression ON ERROR, which is not
>> mentioned in the doc.
>> for example:
>> select JSON_EXISTS(jsonb '{"a": [1,2,3]}', 'strict $.a[5]' default
>> true ON ERROR);
>> select JSON_EXISTS(jsonb '{"a": [1,2,3]}', 'strict $.a[5]' default 0 ON ERROR);
>> select JSON_EXISTS(jsonb '{"a": [1,2,3]}', 'strict $.a[5]' default 11 ON ERROR);
>
>
> Yeah, surprised it works, the documented behavior seems logical. Being able to return a non-boolean here seems odd. Especially since it is cast to boolean on output.
>
>>
>> JSON_VALUE on error, on empty semantics should be the same as json_query.
>> like:
>> [ { ERROR | NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression }
>> ON EMPTY ]
>> [ { ERROR | NULL | EMPTY { [ ARRAY ] | OBJECT } | DEFAULT expression }
>> ON ERROR ])
>>
>> examples:
>> select JSON_value(jsonb '[]' , '$' empty array on error);
>> select JSON_value(jsonb '[]' , '$' empty object on error);
>
>
> Again the documented behavior seems to make sense though and the ability to specify empty in the value function seems like a bug. If you really want an empty array or object you do have access to default. The reason json_query provides for an empty array/object is that it is already expecting to produce an array (object seems a little odd).
>
> I agree our docs and code do not match which needs to be fixed, ideally in the direction of the standard which I'm guessing our documentation is based off of. But let's not go off of my guess.

Oops, that is indeed not great and, yes, the problem is code not
matching the documentation, the latter of which is "correct".

Basically, the grammar allows specifying any of the all possible ON
ERROR/EMPTY behavior values irrespective of the function, so parse
analysis should be catching and flagging an attempt to specify
incompatible value for a given function, which it does not.

I've attached a patch to fix that, which I'd like to push before
anything else we've been discussing.

--
Thanks, Amit Langote

Attachment Content-Type Size
v1-0001-SQL-JSON-Disallow-incompatible-values-in-ON-ERROR.patch application/octet-stream 8.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Bapat 2024-06-21 12:29:12 Re: Improve EXPLAIN output for multicolumn B-Tree Index
Previous Message Jelte Fennema-Nio 2024-06-21 12:07:21 Re: Small LO_BUFSIZE slows down lo_import and lo_export in libpq