Re: Invalid "trailing junk" error message when non-English letters are used

From: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
To: Karina Litskevich <litskevichkarina(at)gmail(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Invalid "trailing junk" error message when non-English letters are used
Date: 2024-08-27 21:06:24
Message-ID: CALT9ZEFG8u=+pBMkON1Ske+We6wtjf=A2SYGvhsZJn5TaHLwLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, Karina!

On Tue, 27 Aug 2024 at 19:06, Karina Litskevich <litskevichkarina(at)gmail(dot)com>
wrote:

> Hi hackers,
>
> When error "trailing junk after numeric literal" occurs at a number
> followed by a symbol that is presented by more than one byte, that symbol
> in the error message is not displayed correctly. Instead of that symbol
> there is only its first byte. That makes the error message an invalid
> UTF-8 (or whatever encoding is set). The whole log file where this error
> message goes also becomes invalid. That could lead to problems with
> reading logs. You can see an invalid message by trying "SELECT 123ä;".
>
> Rejecting trailing junk after numeric literals was introduced in commit
> 2549f066 to prevent scanning a number immediately followed by an
> identifier without whitespace as number and identifier. All the tokens
> that made to catch such cases match a numeric literal and the next byte,
> and that is where the problem comes from. I thought that it could be fixed
> just by using tokens that match a numeric literal immediately followed by
> an identifier, not only one byte. This also improves error messages in
> cases with English letters. After these changes, for "SELECT 123abc;" the
> error message will say that the error appeared at or near "123abc" instead
> of "123a".
>
> I've attached the patch. Are there any pitfalls I can't see? It just keeps
> bothering me why wasn't it done from the beginning. Matching the whole
> identifier after a numeric literal just seems more obvious to me than
> matching its first byte.
>

I see the following compile time warnings:
scan.l:1062: warning, rule cannot be matched
scan.l:1066: warning, rule cannot be matched
scan.l:1070: warning, rule cannot be matched
pgc.l:1030: warning, rule cannot be matched
pgc.l:1033: warning, rule cannot be matched
pgc.l:1036: warning, rule cannot be matched
psqlscan.l:905: warning, rule cannot be matched
psqlscan.l:908: warning, rule cannot be matched
psqlscan.l:911: warning, rule cannot be matched

FWIW output of the whole string in the error message doesnt' look nice to
me, but other places of code do this anyway e.g:
select ('1'||repeat('p',1000000))::integer;
This may be worth fixing.

Regards,
Pavel Borisov
Supabase

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2024-08-27 21:37:27 Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM
Previous Message Matthias van de Meent 2024-08-27 21:02:52 Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)