Re: [PATCH] json_lex_string: don't overread on bad UTF8

From: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: [PATCH] json_lex_string: don't overread on bad UTF8
Date: 2024-05-07 21:06:10
Message-ID: CAOYmi+=yCFok+UNRHDJna5dSasqa9cMHviBZ6pYmtt1Yn_RfRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 6, 2024 at 8:43 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Fri, May 03, 2024 at 07:05:38AM -0700, Jacob Champion wrote:
> > We could port something like that to src/common. IMO that'd be more
> > suited for an actual conversion routine, though, as opposed to a
> > parser that for the most part assumes you didn't lie about the input
> > encoding and is just trying not to crash if you're wrong. Most of the
> > time, the parser just copies bytes between delimiters around and it's
> > up to the caller to handle encodings... the exceptions to that are the
> > \uXXXX escapes and the error handling.
>
> Hmm. That would still leave the backpatch issue at hand, which is
> kind of confusing to leave as it is. Would it be complicated to
> truncate the entire byte sequence in the error message and just give
> up because we cannot do better if the input byte sequence is
> incomplete?

Maybe I've misunderstood, but isn't that what's being done in v2?

> > Maybe I'm missing
> > code somewhere, but I don't see a conversion routine from
> > json_errdetail() to the actual client/locale encoding. (And the parser
> > does not support multibyte input_encodings that contain ASCII in trail
> > bytes.)
>
> Referring to json_lex_string() that does UTF-8 -> ASCII -> give-up in
> its conversion for FRONTEND, I guess? Yep. This limitation looks
> like a problem, especially if plugging that to libpq.

Okay. How we deal with that will likely guide the "optimal" fix to
error reporting, I think...

Thanks,
--Jacob

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-05-07 21:17:02 Re: New GUC autovacuum_max_threshold ?
Previous Message Peter Eisentraut 2024-05-07 21:04:51 Re: partitioning and identity column