Invalid "trailing junk" error message when non-English letters are used

From: Karina Litskevich <litskevichkarina(at)gmail(dot)com>
To: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Invalid "trailing junk" error message when non-English letters are used
Date: 2024-08-27 15:05:53
Message-ID: CACiT8iZ_diop=0zJ7zuY3BXegJpkKK1Av-PU7xh0EDYHsa5+=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

When error "trailing junk after numeric literal" occurs at a number
followed by a symbol that is presented by more than one byte, that symbol
in the error message is not displayed correctly. Instead of that symbol
there is only its first byte. That makes the error message an invalid
UTF-8 (or whatever encoding is set). The whole log file where this error
message goes also becomes invalid. That could lead to problems with
reading logs. You can see an invalid message by trying "SELECT 123ä;".

Rejecting trailing junk after numeric literals was introduced in commit
2549f066 to prevent scanning a number immediately followed by an
identifier without whitespace as number and identifier. All the tokens
that made to catch such cases match a numeric literal and the next byte,
and that is where the problem comes from. I thought that it could be fixed
just by using tokens that match a numeric literal immediately followed by
an identifier, not only one byte. This also improves error messages in
cases with English letters. After these changes, for "SELECT 123abc;" the
error message will say that the error appeared at or near "123abc" instead
of "123a".

I've attached the patch. Are there any pitfalls I can't see? It just keeps
bothering me why wasn't it done from the beginning. Matching the whole
identifier after a numeric literal just seems more obvious to me than
matching its first byte.

Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/

Attachment Content-Type Size
v1-0001-Improve-error-message-for-rejecting-trailing-junk.patch text/x-patch 6.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2024-08-27 15:15:46 Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
Previous Message Laurenz Albe 2024-08-27 14:52:00 Re: proposal: schema variables