Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Yan Chengpeng <chengpeng_yan(at)outlook(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Fix jsonb comparison for raw scalar pseudo arrays
Date: 2024-11-29 08:37:55
Message-ID: CACJufxGehME464jfd2_hTQ8_ZYS5jFD8nxfh30tPQHryPBz7QQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 18, 2024 at 10:25 PM Yan Chengpeng
<chengpeng_yan(at)outlook(dot)com> wrote:
>
>
> I encountered an issue with the B-Tree ordering of `jsonb` values. According to the PostgreSQL documentation[1], the ordering should follow this precedence:
>
> `Object > Array > Boolean > Number > String > Null`
>
>
> However, empty arrays (`[]`) are currently considered smaller than `null`, which violates the documented rules. This occurs due to improper handling of the `rawScalar` flag when comparing arrays in the `compareJsonbContainers()` function in `src/backend/utils/adt/jsonb_util.c`.
>
> ```
>
> The empty array ([]) is incorrectly placed before null.
>
>
> Analysis
>
>
> The issue stems from how the rawScalar flag is evaluated in the compareJsonbContainers() function.
> When comparing arrays, the function does not prioritize the rawScalar flag before comparing the number of elements (nElems), leading to incorrect ordering for arrays treated as “raw scalars.”
>
> Proposed Fix
>
> The proposed fix ensures the rawScalar flag is checked first, and only when both values have the same flag, the number of elements is compared.
> This guarantees correct ordering of arrays and scalar values. The details are in the attached patch.
>

per https://www.postgresql.org/docs/current/datatype-json.html#JSON-INDEXING
Object > Array > Boolean > Number > String > Null

JsonbValue->val.array.rawScalar is false (that is the real array)
should be larger than scalar (Boolean, Number, String, Null).

while sorting, rawScalar flag should have more priority than comparing
the number of elements in an array.
if two jsonb, JsonbValue->val.array.rawScalar values are different,
then we don't need to compare val.array.nElems.

so I think you are right.
but I am confused with your comments change.

src5=# select 'a' < 'A' collate "en_US.utf8";
?column?
----------
t
(1 row)

src5=# select 'a' < 'A' collate "C";
?column?
----------
f
(1 row)

docs says:
""Primitive JSON values are compared using the same comparison rules
as for the underlying PostgreSQL data type.
Strings are compared using the default database collation.
""
To make the regress tests stable, you may need to change the regress
test value ("a", "A")

the only corner case is the empty jsonb array [].
so sql test like:
select jsonb '[]' < jsonb 'null';

should enough?

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robins Tharakan 2024-11-29 08:42:29 psql: Add tab completion for ALTER USER RESET
Previous Message Kirill Reshke 2024-11-29 08:37:20 Re: [PATCH] SVE popcount support