Re: Invalid "trailing junk" error message when non-English letters are used

From: Karina Litskevich <litskevichkarina(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Borisov <pashkin(dot)elfe(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Invalid "trailing junk" error message when non-English letters are used
Date: 2024-09-05 15:07:57
Message-ID: CACiT8iaf-ONO6sF0nC-9ZMpO7jq0eWUkEQXuVznegi3agGzWxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 5, 2024 at 1:52 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Karina Litskevich <litskevichkarina(at)gmail(dot)com> writes:
> > I see the two solutions here: either move the rule for decinteger_junk
> > below the rules for hexinteger_junk, octinteger_junk and
bininteger_junk,
> > or just use a single rule decinteger_junk for all these cases, since the
> > error message is the same anyway. I implemented the latter in the second
> > version of the patch, also renamed this common rule to integer_junk.
>
> That seems reasonable, but IMO this code was unacceptably
> undercommented before and what you've done has made it worse.
> We really need a comment block associated with the flex macros,
> perhaps along the lines of
>
> /*
> * An identifier immediately following a numeric literal is disallowed
> * because in some cases it's ambiguous what is meant: for example,
> * 0x1234 could be either a hexinteger or a decinteger "0" and an
> * identifier "x1234". We can detect such problems by seeing if
> * integer_junk matches a longer substring than any of the XXXinteger
> * patterns. (One "junk" pattern is sufficient because this will match
> * all the same strings we'd match with {hexinteger}{identifier} etc.)
> * Note that the rule for integer_junk must appear after the ones for
> * XXXinteger to make this work correctly.
> */

Thank you, this piece of code definitely needs a comment.

> (Hmm, actually, is that last sentence true? My flex is a bit rusty.)

Yes, the rule for integer_junk must appear after the ones for XXXinteger.
Here is a quote from
https://ftp.gnu.org/old-gnu/Manuals/flex-2.5.4/html_mono/flex.html
"If it finds more than one match, it takes the one matching the most text
(...). If it finds two or more matches of the same length, the rule listed
first in the flex input file is chosen."
For example, 0x123 is matched by both integer_junk and hexinteger, and we
want the rule for hexinteger to be chosen, so we should place it before
the rule for integer_junk.

> param_junk really needs a similar comment, or maybe we could put
> all the XXX_junk macros together and use one comment for all.

In v3 of the patch I grouped all the *_junk rules together and included
the suggested comment with a little added something.

Best regards,
Karina Litskevich
Postgres Professional: http://postgrespro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Karina Litskevich 2024-09-05 15:11:20 Re: Invalid "trailing junk" error message when non-English letters are used
Previous Message Andrew Dunstan 2024-09-05 15:01:49 Re: json_query conditional wrapper bug