| From: | Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com> |
|---|---|
| To: | Karina Litskevich <litskevichkarina(at)gmail(dot)com> |
| Cc: | Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: Invalid "trailing junk" error message when non-English letters are used |
| Date: | 2024-08-27 21:06:24 |
| Message-ID: | CALT9ZEFG8u=+pBMkON1Ske+We6wtjf=A2SYGvhsZJn5TaHLwLA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi, Karina!
On Tue, 27 Aug 2024 at 19:06, Karina Litskevich <litskevichkarina(at)gmail(dot)com>
wrote:
> Hi hackers,
>
> When error "trailing junk after numeric literal" occurs at a number
> followed by a symbol that is presented by more than one byte, that symbol
> in the error message is not displayed correctly. Instead of that symbol
> there is only its first byte. That makes the error message an invalid
> UTF-8 (or whatever encoding is set). The whole log file where this error
> message goes also becomes invalid. That could lead to problems with
> reading logs. You can see an invalid message by trying "SELECT 123ä;".
>
> Rejecting trailing junk after numeric literals was introduced in commit
> 2549f066 to prevent scanning a number immediately followed by an
> identifier without whitespace as number and identifier. All the tokens
> that made to catch such cases match a numeric literal and the next byte,
> and that is where the problem comes from. I thought that it could be fixed
> just by using tokens that match a numeric literal immediately followed by
> an identifier, not only one byte. This also improves error messages in
> cases with English letters. After these changes, for "SELECT 123abc;" the
> error message will say that the error appeared at or near "123abc" instead
> of "123a".
>
> I've attached the patch. Are there any pitfalls I can't see? It just keeps
> bothering me why wasn't it done from the beginning. Matching the whole
> identifier after a numeric literal just seems more obvious to me than
> matching its first byte.
>
I see the following compile time warnings:
scan.l:1062: warning, rule cannot be matched
scan.l:1066: warning, rule cannot be matched
scan.l:1070: warning, rule cannot be matched
pgc.l:1030: warning, rule cannot be matched
pgc.l:1033: warning, rule cannot be matched
pgc.l:1036: warning, rule cannot be matched
psqlscan.l:905: warning, rule cannot be matched
psqlscan.l:908: warning, rule cannot be matched
psqlscan.l:911: warning, rule cannot be matched
FWIW output of the whole string in the error message doesnt' look nice to
me, but other places of code do this anyway e.g:
select ('1'||repeat('p',1000000))::integer;
This may be worth fixing.
Regards,
Pavel Borisov
Supabase
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Jeff Davis | 2024-08-27 21:37:27 | Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM |
| Previous Message | Matthias van de Meent | 2024-08-27 21:02:52 | Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) |