From: | Karina Litskevich <litskevichkarina(at)gmail(dot)com> |
---|---|
To: | Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com> |
Cc: | Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Invalid "trailing junk" error message when non-English letters are used |
Date: | 2024-08-28 10:00:33 |
Message-ID: | CACiT8iZb+o_=MGJ72qM8HOuam0-hv09bj8-9x-XsODH1PrwP_A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Aug 28, 2024 at 12:06 AM Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>
wrote:
> I see the following compile time warnings:
> scan.l:1062: warning, rule cannot be matched
> scan.l:1066: warning, rule cannot be matched
> scan.l:1070: warning, rule cannot be matched
> pgc.l:1030: warning, rule cannot be matched
> pgc.l:1033: warning, rule cannot be matched
> pgc.l:1036: warning, rule cannot be matched
> psqlscan.l:905: warning, rule cannot be matched
> psqlscan.l:908: warning, rule cannot be matched
> psqlscan.l:911: warning, rule cannot be matched
>
Thanks for the feedback!
I somehow missed these warnings, my bad. The problem is with queries like
"select 0x12junk;". In master "0x" matches decinteger_junk token and
"0x12j" matches hexinteger_junk token and flex chooses the longest match,
no conflict. But with the patch "0x12junk" matches both decinteger_junk
(decinteger "0" + identifier "x12junk") and hexinteger_junk (hexinteger
"0x12" + identifier "junk"). Since any match to hexinteger_junk also
matches decinteger_junk, and the rule for hexinteger_junk is below the
rule for decinteger_junk, it's never reached.
I see the two solutions here: either move the rule for decinteger_junk
below the rules for hexinteger_junk, octinteger_junk and bininteger_junk,
or just use a single rule decinteger_junk for all these cases, since the
error message is the same anyway. I implemented the latter in the second
version of the patch, also renamed this common rule to integer_junk.
Additionally, I noticed that this patch is going to change error messages
in some cases, though I don't think it's a big deal. Example:
Without patch:
postgres=# select 0xyz;
ERROR: invalid hexadecimal integer at or near "0x"
With patch:
postgres=# select 0xyz;
ERROR: trailing junk after numeric literal at or near "0xyz"
> FWIW output of the whole string in the error message doesnt' look nice to
> me, but other places of code do this anyway e.g:
> select ('1'||repeat('p',1000000))::integer;
> This may be worth fixing.
>
That's interesting. I didn't know we could do that to create a long error
message. At first I thought that it's not a problem for error messages
from the scanner, since its "at or near" string cannot be longer than the
query typed in psql or written in a script file so it won't be enormously
big. But that's just not true, because we can send a generated query.
Something like that:
With patch:
postgres=# select 'select '||'1'||repeat('p',1000000) \gexec
ERROR: trailing junk after numeric literal at or near "1ppp<lots of p>"
And another query that leads to this without patch:
postgres=# select 'select 1'||repeat('@',1000000)||'1' \gexec
ERROR: operator too long at or near "@@@<lots of @>"
It would be nice to prevent such long strings in error messages. Maybe a
GUC variable to set the maximum length for such strings could work. But
how do we determine all places where it is needed?
Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/
Attachment | Content-Type | Size |
---|---|---|
v2-0001-Improve-error-message-for-rejecting-trailing-junk.patch | text/x-patch | 8.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Aleksander Alekseev | 2024-08-28 10:04:38 | Re: Add const qualifiers to XLogRegister*() functions |
Previous Message | Amit Kapila | 2024-08-28 09:38:27 | Re: Doc: fix the note related to the GUC "synchronized_standby_slots" |