Re: [PATCH] json_lex_string: don't overread on bad UTF8

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: Re: [PATCH] json_lex_string: don't overread on bad UTF8
Date: 2024-05-02 02:23:13
Message-ID: ZjL5Ed6LDZGDGILj@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 01, 2024 at 04:22:24PM -0700, Jacob Champion wrote:
> On Tue, Apr 30, 2024 at 11:09 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> Not sure to like much the fact that this advances token_terminator
>> first. Wouldn't it be better to calculate pg_encoding_mblen() first,
>> then save token_terminator? I feel a bit uneasy about saving a value
>> in token_terminator past the end of the string. That a nit in this
>> context, still..
>
> v2 tries it that way; see what you think. Is the concern that someone
> might add code later that escapes that macro early?

Yeah, I am not sure if that's something that would really happen, but
that looks like a good practice to keep anyway to keep a clean stack
at any time.

>> Ah, that makes sense. That looks OK here. A comment around the test
>> would be adapted to document that, I guess.
>
> Done.

That seems OK at quick glance. I don't have much room to do something
about this patch this week as an effect of Golden Week and the
buildfarm effect, but I should be able to get to it next week once the
next round of minor releases is tagged.

About the fact that we may finish by printing unfinished UTF-8
sequences, I'd be curious to hear your thoughts. Now, the information
provided about the partial byte sequences can be also useful for
debugging on top of having the error code, no?
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-05-02 03:39:40 Re: [PATCH] json_lex_string: don't overread on bad UTF8
Previous Message David Rowley 2024-05-02 02:02:46 Re: New GUC autovacuum_max_threshold ?