Quick Links

Re: [PATCH] json_lex_string: don't overread on bad UTF8

From:	Michael Paquier <michael(at)paquier(dot)xyz>
To:	Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject:	Re: [PATCH] json_lex_string: don't overread on bad UTF8
Date:	2024-05-02 03:39:40
Message-ID:	ZjMK_N0VokrEe1Ws@paquier.xyz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, May 02, 2024 at 11:23:13AM +0900, Michael Paquier wrote:
> About the fact that we may finish by printing unfinished UTF-8
> sequences, I'd be curious to hear your thoughts. Now, the information
> provided about the partial byte sequences can be also useful for
> debugging on top of having the error code, no?

By the way, as long as I have that in mind.. I am not sure that it is
worth spending cycles in detecting the unfinished sequences and make
these printable. Wouldn't it be enough for more cases to adjust
token_error() to truncate the byte sequences we cannot print?

Another thing that I think would be nice would be to calculate the
location of what we're parsing on a given line, and provide that in
the error context. That would not be backpatchable as it requires a
change in JsonLexContext, unfortunately, but it would help in making
more sense with an error if the incomplete byte sequence is at the
beginning of a token or after an expected character.
--
Michael

In response to

Re: [PATCH] json_lex_string: don't overread on bad UTF8 at 2024-05-02 02:23:13 from Michael Paquier

Responses

Re: [PATCH] json_lex_string: don't overread on bad UTF8 at 2024-05-02 23:29:18 from Jacob Champion

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kashif Zeeshan	2024-05-02 03:42:41	Re: Document NULL
Previous Message	Michael Paquier	2024-05-02 02:23:13	Re: [PATCH] json_lex_string: don't overread on bad UTF8