| From: | John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> | 
|---|---|
| To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: benchmarking Flex practices | 
| Date: | 2019-07-12 07:35:57 | 
| Message-ID: | CACPNZCsuAM1vyBo8jgTRBhOz+bht1=GcxceMoxwNGCwZxUGBtA@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Wed, Jul 10, 2019 at 3:15 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> writes:
> > [ v4 patches for trimming lexer table size ]
>
> I reviewed this and it looks pretty solid.  One gripe I have is
> that I think it's best to limit backup-prevention tokens such as
> quotecontinuefail so that they match only exact prefixes of their
> "success" tokens.  This seems clearer to me, and in at least some cases
> it can save a few flex states.  The attached v5 patch does it like that
> and gets us down to 22331 states (from 23696).  In some places it looks
> like you did that to avoid writing an explicit "{other}" match rule for
> an exclusive state, but I think it's better for readability and
> separation of concerns to go ahead and have those explicit rules
> (and it seems to make no difference table-size-wise).
Looks good to me.
> We still need to propagate these changes into the psql and ecpg lexers,
> but I assume you were waiting to agree on the core patch before touching
> those.  If you're good with the changes I made here, have at it.
I just made a couple additional cosmetic adjustments that made sense
when diff'ing with the other scanners. Make check-world passes. Some
notes:
The pre-existing ecpg var "state_before" was a bit confusing when
combined with the new var "state_before_quote_stop", and the former is
also used with C-comments, so I decided to go with
"state_before_lit_start" and "state_before_lit_stop". Even though
comments aren't literals, it's less of a stretch than referring to
quotes. To keep things consistent, I went with the latter var in psql
and core.
To get the regression tests to pass, I had to add this:
 psql_scan_in_quote(PsqlScanState state)
 {
- return state->start_state != INITIAL;
+ return state->start_state != INITIAL &&
+ state->start_state != xqs;
 }
...otherwise with parens we sometimes don't get the right prompt and
we get empty lines echoed. Adding xuend and xuchar here didn't seem to
make a difference. There might be something subtle I'm missing, so I
thought I'd mention it.
With the unicode escape rules brought over, the diff to the ecpg
scanner is much cleaner now. The diff for C-comment rules were still
pretty messy in comparison, so I made an attempt to clean that up in
0002. A bit off-topic, but I thought I should offer that while it was
fresh in my head.
-- 
John Naylor                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
| Attachment | Content-Type | Size | 
|---|---|---|
| v6-0001-Reduce-the-number-of-states-in-the-core-scanner-t.patch | application/octet-stream | 35.2 KB | 
| v6-0002-Merge-ECPG-scanner-states-for-C-style-comments.patch | application/octet-stream | 4.4 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2019-07-12 08:01:41 | Re: Comment fix of config_default.pl | 
| Previous Message | Kyotaro Horiguchi | 2019-07-12 07:10:16 | Re: Remove page-read callback from XLogReaderState. |