Re: Invalid "trailing junk" error message when non-English letters are used

From: Karina Litskevich <litskevichkarina(at)gmail(dot)com>
To: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Invalid "trailing junk" error message when non-English letters are used
Date: 2024-08-28 10:00:33
Message-ID: CACiT8iZb+o_=MGJ72qM8HOuam0-hv09bj8-9x-XsODH1PrwP_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 28, 2024 at 12:06 AM Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
wrote:

> I see the following compile time warnings:
> scan.l:1062: warning, rule cannot be matched
> scan.l:1066: warning, rule cannot be matched
> scan.l:1070: warning, rule cannot be matched
> pgc.l:1030: warning, rule cannot be matched
> pgc.l:1033: warning, rule cannot be matched
> pgc.l:1036: warning, rule cannot be matched
> psqlscan.l:905: warning, rule cannot be matched
> psqlscan.l:908: warning, rule cannot be matched
> psqlscan.l:911: warning, rule cannot be matched
>

Thanks for the feedback!

I somehow missed these warnings, my bad. The problem is with queries like
"select 0x12junk;". In master "0x" matches decinteger_junk token and
"0x12j" matches hexinteger_junk token and flex chooses the longest match,
no conflict. But with the patch "0x12junk" matches both decinteger_junk
(decinteger "0" + identifier "x12junk") and hexinteger_junk (hexinteger
"0x12" + identifier "junk"). Since any match to hexinteger_junk also
matches decinteger_junk, and the rule for hexinteger_junk is below the
rule for decinteger_junk, it's never reached.

I see the two solutions here: either move the rule for decinteger_junk
below the rules for hexinteger_junk, octinteger_junk and bininteger_junk,
or just use a single rule decinteger_junk for all these cases, since the
error message is the same anyway. I implemented the latter in the second
version of the patch, also renamed this common rule to integer_junk.

Additionally, I noticed that this patch is going to change error messages
in some cases, though I don't think it's a big deal. Example:

Without patch:
postgres=# select 0xyz;
ERROR: invalid hexadecimal integer at or near "0x"

With patch:
postgres=# select 0xyz;
ERROR: trailing junk after numeric literal at or near "0xyz"

> FWIW output of the whole string in the error message doesnt' look nice to
> me, but other places of code do this anyway e.g:
> select ('1'||repeat('p',1000000))::integer;
> This may be worth fixing.
>

That's interesting. I didn't know we could do that to create a long error
message. At first I thought that it's not a problem for error messages
from the scanner, since its "at or near" string cannot be longer than the
query typed in psql or written in a script file so it won't be enormously
big. But that's just not true, because we can send a generated query.
Something like that:

With patch:
postgres=# select 'select '||'1'||repeat('p',1000000) \gexec
ERROR: trailing junk after numeric literal at or near "1ppp<lots of p>"

And another query that leads to this without patch:
postgres=# select 'select 1'||repeat('@',1000000)||'1' \gexec
ERROR: operator too long at or near "@@@<lots of @>"

It would be nice to prevent such long strings in error messages. Maybe a
GUC variable to set the maximum length for such strings could work. But
how do we determine all places where it is needed?

Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/

Attachment Content-Type Size
v2-0001-Improve-error-message-for-rejecting-trailing-junk.patch text/x-patch 8.0 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2024-08-28 10:04:38 Re: Add const qualifiers to XLogRegister*() functions
Previous Message Amit Kapila 2024-08-28 09:38:27 Re: Doc: fix the note related to the GUC "synchronized_standby_slots"