From: | John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: benchmarking Flex practices |
Date: | 2019-12-03 11:02:17 |
Message-ID: | CACPNZCv9fWE9Q8yVwHgQ4=5y3RBi2CkzoG2MD8Q_3ftNVt0sCg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Nov 26, 2019 at 10:32 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I haven't looked closely at what ecpg does with the processed
> identifiers. If it just spits them out as-is, a possible solution
> is to not do anything about de-escaping, but pass the sequence
> U&"..." (plus UESCAPE ... if any), just like that, on to the grammar
> as the value of the IDENT token.
It does pass them along as-is, so I did it that way.
In the attached v10, I've synced both ECPG and psql.
> * I haven't convinced myself either way as to whether it'd be
> better to factor out the code duplicated between the UIDENT
> and UCONST cases in base_yylex.
I chose to factor it out, since we have 2 versions of parser.c, and
this way was much easier to work with.
Some notes:
I arranged for the ECPG grammar to only see SCONST and IDENT. With
UCONST and UIDENT out of the way, it was a small additional step to
put all string reconstruction into the lexer, which has the advantage
of allowing removal of the other special-case ECPG string tokens as
well. The fewer special cases involved in pasting the grammar
together, the better. In doing so, I've probably introduced memory
leaks, but I wanted to get your opinion on the overall approach before
investigating.
In ECPG's parser.c, I simply copied check_uescapechar() and
ecpg_isspace(), but we could find a common place if desired. During
development, I found that this file replicates the location-tracking
logic in the backend, but doesn't seem to make use of it. I also would
have had to replicate the backend's datatype for YYLTYPE. Fixing that
might be worthwhile some day, but to get this working, I just ripped
out the extra location tracking.
I no longer use state variables to track scanner state, and in fact I
removed the existing "state_before" variable in ECPG. Instead, I used
the Flex builtins yy_push_state(), yy_pop_state(), and yy_top_state().
These have been a feature for a long time, it seems, so I think we're
okay as far as portability. I think it's cleaner this way, and
possibly faster. I also used this to reunite the xcc and xcsql states.
This whole part could be split out into a separate refactoring patch
to be applied first, if desired.
--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
v10-handle-uescapes-in-parser.patch | application/octet-stream | 61.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Pengzhou Tang | 2019-12-03 11:05:34 | Errors when update a view with conditional-INSTEAD rules |
Previous Message | Amit Kapila | 2019-12-03 10:56:49 | Re: [HACKERS] Block level parallel vacuum |