Re: ERROR: unsupported Unicode escape sequence - in JSON-type column

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Erik Wienhold <ewie(at)ewie(dot)name>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Jan Bilek <jan(dot)bilek(at)eftlab(dot)com(dot)au>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: unsupported Unicode escape sequence - in JSON-type column
Date: 2023-02-27 16:12:19
Message-ID: 368156.1677514339@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Erik Wienhold <ewie(at)ewie(dot)name> writes:
>> On 27/02/2023 13:13 CET Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
>> I'd be curious to know how the customer managed to do that.
>> Perhaps there is a loophole in PostgreSQL that needs to be fixed.

> Another reason to prefer jsonb over json to reject such inputs right away.
> The documentation states that json does not validate inputs in constrast to
> jsonb.

It's not that it doesn't validate, it's that the validation rules are
different. Per the manual [1]:

RFC 7159 permits JSON strings to contain Unicode escape sequences
denoted by \uXXXX. In the input function for the json type, Unicode
escapes are allowed regardless of the database encoding, and are
checked only for syntactic correctness (that is, that four hex digits
follow \u). However, the input function for jsonb is stricter: it
disallows Unicode escapes for characters that cannot be represented in
the database encoding. The jsonb type also rejects \u0000 (because
that cannot be represented in PostgreSQL's text type), and it insists
that any use of Unicode surrogate pairs to designate characters
outside the Unicode Basic Multilingual Plane be correct.

You can certainly quibble with our decisions here, but I think they
are reasonably consistent. json is for data that you'd like a syntax
check on (else you might as well store it as "text"), but no more than
a syntax check, because you're going to do the actual JSON processing
elsewhere and you don't want Postgres opining on what semi-standard
JSON constructs mean. If you're actually going to process the data
inside the database, jsonb is a better choice. The extra restrictions
in jsonb are to ensure that a string value represented in JSON can be
extracted into a valid string of our text datatype.

Storing data in json and then casting to jsonb on-the-fly seems like
about the worst possible combination of choices.

regards, tom lane

[1] https://www.postgresql.org/docs/current/datatype-json.html

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2023-02-27 16:14:25 Re: pg_upgradecluster transfering only a portion of the data
Previous Message Dávid Suchan 2023-02-27 15:44:58 pg_upgradecluster transfering only a portion of the data