Re: remaining sql/json patches

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Himanshu Upadhyaya <upadhyaya(dot)himanshu(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Erik Rijkers <er(at)xs4all(dot)nl>, jian he <jian(dot)universality(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: remaining sql/json patches
Date: 2024-03-07 12:06:08
Message-ID: CA+HiwqF5CHyRBO8WwOcQsKufFWZ1yjDaQQ7DJxsQNbm2nab3pg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 7, 2024 at 8:13 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> On 3/7/24 06:18, Himanshu Upadhyaya wrote:

Thanks Himanshu for the testing.

> > On Wed, Mar 6, 2024 at 9:04 PM Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
> > wrote:
> >>
> >> I'm pretty sure this is the correct & expected behavior. The second
> >> query treats the value as string (because that's what should happen for
> >> values in double quotes).
> >>
> >> ok, Then why does the below query provide the correct conversion, even if
> > we enclose that in double quotes?
> > ‘postgres[102531]=#’SELECT * FROM JSON_TABLE(jsonb '{
> > "id" : "1234567890",
> > "FULL_NAME" : "JOHN DOE"}',
> > '$'
> > COLUMNS(
> > name varchar(20) PATH 'lax $.FULL_NAME',
> > id int PATH 'lax $.id'
> > )
> > )
> > ;
> > name | id
> > ----------+------------
> > JOHN DOE | 1234567890
> > (1 row)
> >
> > and for bigger input(string) it will leave as empty as below.
> > ‘postgres[102531]=#’SELECT * FROM JSON_TABLE(jsonb '{
> > "id" : "12345678901",
> > "FULL_NAME" : "JOHN DOE"}',
> > '$'
> > COLUMNS(
> > name varchar(20) PATH 'lax $.FULL_NAME',
> > id int PATH 'lax $.id'
> > )
> > )
> > ;
> > name | id
> > ----------+----
> > JOHN DOE |
> > (1 row)
> >
> > seems it is not something to do with data enclosed in double quotes but
> > somehow related with internal casting it to integer and I think in case of
> > bigger input it is not able to cast it to integer(as defined under COLUMNS
> > as id int PATH 'lax $.id')
> >
> > ‘postgres[102531]=#’SELECT * FROM JSON_TABLE(jsonb '{
> > "id" : "12345678901",
> > "FULL_NAME" : "JOHN DOE"}',
> > '$'
> > COLUMNS(
> > name varchar(20) PATH 'lax $.FULL_NAME',
> > id int PATH 'lax $.id'
> > )
> > )
> > ;
> > name | id
> > ----------+----
> > JOHN DOE |
> > (1 row)
> > )
> >
> > if it is not able to represent it to integer because of bigger input, it
> > should error out with a similar error message instead of leaving it empty.
> >
> > Thoughts?
> >
>
> Ah, I see! Yes, that's a bit weird. Put slightly differently:
>
> test=# SELECT * FROM JSON_TABLE(jsonb '{"id" : "2000000000"}',
> '$' COLUMNS(id int PATH '$.id'));
> id
> ------------
> 2000000000
> (1 row)
>
> Time: 0.248 ms
> test=# SELECT * FROM JSON_TABLE(jsonb '{"id" : "3000000000"}',
> '$' COLUMNS(id int PATH '$.id'));
> id
> ----
>
> (1 row)
>
> Clearly, when converting the string literal into int value, there's some
> sort of error handling that realizes 3B overflows, and returns NULL
> instead. I'm not sure if this is intentional.

Indeed.

This boils down to the difference in the cast expression chosen to
convert the source value to int in the two cases.

The case where the source value has no quotes, the chosen cast
expression is a FuncExpr for function numeric_int4(), which has no way
to suppress errors. When the source value has quotes, the cast
expression is a CoerceViaIO expression, which can suppress the error.
The default behavior is to suppress the error and return NULL, so the
correct behavior is when the source value has quotes.

I think we'll need either:

* fix the code in 0001 to avoid getting numeric_int4() in this case,
and generally cast functions that don't have soft-error handling
support, in favor of using IO coercion.
* fix FuncExpr (like CoerceViaIO) to respect SQL/JSON's request to
suppress errors and fix downstream functions like numeric_int4() to
comply by handling errors softly.

I'm inclined to go with the 1st option as we already have the
infrastructure in place -- input functions can all handle errors
softly.

For the latter, it uses numeric_int4() which doesn't support
soft-error handling, so throws the error. With quotes, the

--
Thanks, Amit Langote

--
Thanks, Amit Langote

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-03-07 12:24:55 Re: [HACKERS] make async slave to wait for lsn to be replayed
Previous Message Ashutosh Bapat 2024-03-07 11:54:40 Re: Invalid query generated by postgres_fdw with UNION ALL and ORDER BY