Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?

From: Bryn Llewellyn <bryn(at)yugabyte(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: pgsql-general list <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?
Date: 2022-06-04 04:45:52
Message-ID: C5A701D6-50AC-4561-B64C-318DBF52A312@yugabyte.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> david(dot)g(dot)johnston(at)gmail(dot)com wrote:
>
> bryn(at)yugabyte(dot)com wrote:
>
>> declare
>> j1 constant jsonb not null := '{"x": 42, "y": null}';
>> j2 constant jsonb not null := '{"x": 42 }';
>> ...
>> (j1->>'y' is null)
>
> This produces a JSON Null which when asked for as a text data type results into SQL NULL due to implicit type coercion
>
>> (j2->>'y' is null)
>
> This produces "key y not present in JSON" but someone decided that was too unfriendly and so we instead produce SQL NULL. This SQL NULL exists for a different reason than the SQL NULL in the previous case. Unfortunately, there is only a single concept of NULL in the SQL Language.
>
>> predicate_2 constant boolean not null := j1 = j2;
>
> This is trivially false, j1 has a key of "y" while j2 does not. If there is a complaint to be had, this returning false isn't it.
>
>> Another test (I won't bore you with its code) confirms that the order in which the incoming document lists key-value pairs has no consequence for its meaning.
>
> Yes, the equality test is semantic, do all the same keys and corresponding values exist? If so, return true.
>
>> j1_in constant jsonb not null := '{"a": 17, "b": null}';
>> j2_in constant jsonb not null := '{"a": 17 }';
>>
>> This is the result:
>>
>> {"a": 17, "b": null}
>> {"a": 17, "b": null}
>
> Yep, for basically the same reason as the first point. The nulls are consequences of different situations (lack of key, value of key being null) being largely undistinguishable at the SQL level. We provide a "does key exists" operator if you do need to make that determination.
>
>> I have a little demo where I shred a set of "books" incoming JSON documents (where a book has a genre and many authors) into the classic Codd-and-Date four tables: books, authors, genres, and book_author_intersections. Then I scrunch each book back to a single JSON doc. I want to prove that I get back what I started with.
>
> Yeah, the lack of any goal of round tripping conversions from JSON through SQL and back into SQL makes proving that the system does such a thing problematic. You'll get a back something meaningfully equivalent, by your own argument, but not identical on a key-by-key basis.

Thanks for attempting to satisfy me, David—but I'm sad to say that I remain somewhat unsatisfied. This is doubtless my fault. I suppose that I knew, before I asked, that the ultimate answer would be "It is what it is and it will never change."

Of course, I can predict the outcomes of countless tests on the basis that I understand "what it is". Here's just one such (just like you suggested):

do $body$
declare
j1 constant jsonb not null := '{"a": 1, "b": null}';
j2 constant jsonb not null := '{"a": 1 }';

n1 constant int := (select count(*) from jsonb_object_keys(j1));
n2 constant int := (select count(*) from jsonb_object_keys(j2));
begin
assert (j1 ? 'b');
assert not (j2 ? 'b');
assert (n1 = 2) and (n2 = 1);
end;
$body$;

The idea of "null" is a strange beast just within PostgreSQL—let alone all SQL systems. The classic understanding of its semantics is "There is simply no information available about the value". And this explains the treacherous three-by-three outcome table for operators like OR and AND. However, this understanding can't cope with the outcome here:

create type t_ as (a int, b int);
do $body$
declare
b1 constant boolean := null;
b2 constant boolean := null;
r1 constant t_ not null := (b1, b2);
r2 constant t_ not null := (b1, b2);
begin
assert (b1 = b2) is null;
assert (b1 is not distinct from b2);

assert (r1 = r2);
end;
$body$;

I'd've thought that I'd need to weigh in with "is not distinct from" to get "r1" and "r2" to compare as the same just as I do with "b1" and "b2". So maybe it's not surprising that "null" in JSON is quirky too.

I'm going to try to think like this:

The number of possible spellings of the names of keys in a JSON object is some flavor of infinite. So including this in an object:

"k": null

really is saying something. It says that I do know about "k" and that yet I have simply no information available about its value.

The quirkiness that my first example showed can be seen differently from how I saw it first. When I now consider this expression:

('{"x": 42}'::jsonb)->>'y'

it seems that its evaluation should simply raise an exception. But you said:

> This produces "key y not present in JSON" but someone decided that was too unfriendly and so we instead produce SQL NULL.

Oh well, I know how to program the cases that I care about to get the outcomes that I want. It just means lots of typing. But that's anyway what one signs up for who decides to work with JSON…

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2022-06-04 05:37:43 Re: '{"x": 42, "y": null}'::jsonb != '{"x": 42}'::jsonb ... Really?
Previous Message mengfanjun 2022-06-04 03:07:56 Call pstrdup() of palloc.h will change source string, please help!