From: | John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: benchmarking Flex practices |
Date: | 2019-07-03 12:14:25 |
Message-ID: | CACPNZCvWPVC_RhNgSgQ7aaU38Ru4HXGZX4xusPDeFS1D0WSY5Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 3, 2019 at 5:35 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> writes:
> > 0001 is a small patch to remove some unneeded generality from the
> > current rules. This lowers the number of elements in the yy_transition
> > array from 37045 to 36201.
>
> I don't particularly like 0001. The two bits like this
>
> -whitespace ({space}+|{comment})
> +whitespace ({space}|{comment})
>
> seem likely to create performance problems for runs of whitespace, in that
> the lexer will now have to execute the associated action once per space
> character not just once for the whole run.
Okay.
> There are a bunch of higher-order productions that use "{whitespace}*",
> which is surely a bit redundant given the contents of {whitespace}.
> But maybe we could address that by replacing "{whitespace}*" with
> "{opt_whitespace}" defined as
>
> opt_whitespace ({space}*|{comment})
>
> Not sure what impact if any that'd have on table size, but I'm quite sure
> that {whitespace} was defined with an eye to avoiding unnecessary
> lexer action cycles.
It turns out that {opt_whitespace} as defined above is not equivalent
to {whitespace}* , since the former is either a single comment or a
single run of 0 or more whitespace chars (if I understand correctly).
Using {opt_whitespace} for the UESCAPE rules on top of v3-0002, the
regression tests pass, but queries like this fail with a syntax error:
# select U&'d!0061t!+000061' uescape --comment
'!';
There was in fact a substantial size reduction, though, so for
curiosity's sake I tried just replacing {whitespace}* with {space}* in
the UESCAPE rules, and the table shrank from 30367 (that's with 0002
only) to 24661.
> As for the other two bits that are like
>
> -<xe>. {
> - /* This is only needed for \ just before EOF */
> +<xe>\\ {
>
> my recollection is that those productions are defined that way to avoid a
> flex warning about not all possible input characters being accounted for
> in the <xe> (resp. <xdolq>) state. Maybe that warning is
> flex-version-dependent, or maybe this was just a worry and not something
> that actually produced a warning ... but I'm hesitant to change it.
> If we ever did get to flex's default action, that action is to echo the
> current input character to stdout, which would be Very Bad.
FWIW, I tried Flex 2.5.35 and 2.6.4 with no warnings, and I did get a
warning when I deleted any of those two rules. I'll leave them out for
now, since this change was only good for ~500 fewer elements in the
transition array.
> As far as I can see, the point of 0002 is to have just one set of
> flex rules for the various variants of quotecontinue processing.
> That sounds OK, though I'm a bit surprised it makes this much difference
> in the table size. I would suggest that "state_before" needs a less
> generic name (maybe "state_before_xqs"?) and more than no comment.
> Possibly more to the point, it's not okay to have static state variables
> in the core scanner, so that variable needs to be kept in yyextra.
> (Don't remember offhand whether it's any more acceptable in the other
> scanners.)
Ah yes, I got this idea from the ECPG scanner, which is not reentrant. Will fix.
--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Anthony Nowocien | 2019-07-03 13:24:55 | Re: Conflict handling for COPY FROM |
Previous Message | Surafel Temesgen | 2019-07-03 11:42:00 | Re: Conflict handling for COPY FROM |