From: | John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: benchmarking Flex practices |
Date: | 2019-11-26 11:38:00 |
Message-ID: | CACPNZCuGfVe+Wp+4N7rVfQ4GAYjuCHzSsx8NR8=2n_OTwU9Eow@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Nov 26, 2019 at 5:51 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> [ My apologies for being so slow to get back to this ]
No worries -- it's a nice-to-have, not something our users are excited about.
> It struck me though that there's another solution we haven't discussed,
> and that's to make the token lookahead filter in parser.c do the work
> of converting UIDENT [UESCAPE SCONST] to IDENT, and similarly for the
> string case.
I recently tried again to get gram.y to handle it without precedence
hacks (or at least hacks with less mystery) and came to the conclusion
that maybe it just doesn't belong in the grammar after all. I hadn't
thought of any alternatives, so thanks for working on that!
It seems something is not quite right in v9 with the error position reporting:
SELECT U&'wrong: +0061' UESCAPE '+';
ERROR: invalid Unicode escape character at or near "'+'"
LINE 1: SELECT U&'wrong: +0061' UESCAPE '+';
- ^
+ ^
The caret is not pointing to the third token, or the second for that
matter. What worked for me was un-truncating the current token before
calling yylex again. To see if I'm on the right track, I've included
this in the attached, which applies on top of your v9.
> Generally, I'm pretty happy with this approach: it touches gram.y
> hardly at all, and it removes just about all of the complexity from
> scan.l. I'm happier about dropping the support code into parser.c
> than the other choices we've discussed.
Seems like the best of both worlds. If we ever wanted to ditch the
whole token filter and use Bison's %glr mode, we'd have extra work to
do, but there doesn't seem to be a rush to do so anyway.
> There's still undone work here, though:
>
> * I did not touch psql. Probably your patch is fine for that.
>
> * I did not do more with ecpg than get it to compile, using the
> same hacks as in your v7. It still fails its regression tests,
> but now the reason is that what we've done in parser/parser.c
> needs to be transposed into the identical functionality in
> ecpg/preproc/parser.c. Or at least some kind of functionality
> there. A problem with this approach is that it presumes we can
> reduce a UIDENT sequence to a plain IDENT, but to do so we need
> assumptions about the target encoding, and I'm not sure that
> ecpg should make any such assumptions. Maybe ecpg should just
> reject all cases that produce non-ASCII identifiers? (Probably
> it could be made to do something smarter with more work, but
> it's not clear to me that it's worth the trouble.)
Hmm, I thought we only allowed Unicode escapes in the first place if
the server encoding was UTF-8. Or did you mean something else?
> If this seems like a reasonable approach to you, please fill in
> the missing psql and ecpg bits.
Will do.
--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
v9-addendum-handle-uescapes-in-parser.patch | application/octet-stream | 581 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2019-11-26 12:13:16 | [PATCH] Remove twice assignment with var pageop (nbtree.c). |
Previous Message | Etsuro Fujita | 2019-11-26 11:35:33 | Re: A problem about partitionwise join |