Re: [PATCH] json_lex_string: don't overread on bad UTF8

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: [PATCH] json_lex_string: don't overread on bad UTF8
Date: 2024-05-09 04:27:08
Message-ID: ZjxQnOD1OoCkEeMN@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 08, 2024 at 07:01:08AM -0700, Jacob Champion wrote:
> On Tue, May 7, 2024 at 10:31 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> But looking closer, I can see that in the JSON_INVALID_TOKEN case,
>> when !tok_done, we set token_terminator to point to the end of the
>> token, and that would include an incomplete byte sequence like in your
>> case. :/
>
> Ah, I see what you're saying. Yeah, that approach would need some more
> invasive changes.

My first feeling was actually to do that, and report the location in
the input string where we are seeing issues. All code paths playing
with token_terminator would need to track that.

> Agreed. Fortunately (or unfortunately?) I think the JSON
> client-encoding work is now a prerequisite for OAuth in libpq, so
> hopefully some improvements can fall out of that work too.

I'm afraid so. I don't quite see how this would be OK to tweak on
stable branches, but all areas that could report error states with
partial byte sequence contents would benefit from such a change.

>> Thoughts and/or objections?
>
> None here.

This is a bit mitigated by the fact that d6607016c738 is recent, but
this is incorrect since v13 so backpatched down to that.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-05-09 04:44:47 Re: First draft of PG 17 release notes
Previous Message Paul Jungwirth 2024-05-09 04:24:09 Re: SQL:2011 application time