Quick Links

Re: benchmarking Flex practices

From:	John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: benchmarking Flex practices
Date:	2019-11-26 11:38:00
Message-ID:	CACPNZCuGfVe+Wp+4N7rVfQ4GAYjuCHzSsx8NR8=2n_OTwU9Eow@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Nov 26, 2019 at 5:51 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> [ My apologies for being so slow to get back to this ]

No worries -- it's a nice-to-have, not something our users are excited about.

> It struck me though that there's another solution we haven't discussed,
> and that's to make the token lookahead filter in parser.c do the work
> of converting UIDENT [UESCAPE SCONST] to IDENT, and similarly for the
> string case.

I recently tried again to get gram.y to handle it without precedence
hacks (or at least hacks with less mystery) and came to the conclusion
that maybe it just doesn't belong in the grammar after all. I hadn't
thought of any alternatives, so thanks for working on that!

It seems something is not quite right in v9 with the error position reporting:

SELECT U&'wrong: +0061' UESCAPE '+';
ERROR: invalid Unicode escape character at or near "'+'"
LINE 1: SELECT U&'wrong: +0061' UESCAPE '+';
- ^
+ ^

The caret is not pointing to the third token, or the second for that
matter. What worked for me was un-truncating the current token before
calling yylex again. To see if I'm on the right track, I've included
this in the attached, which applies on top of your v9.

> Generally, I'm pretty happy with this approach: it touches gram.y
> hardly at all, and it removes just about all of the complexity from
> scan.l. I'm happier about dropping the support code into parser.c
> than the other choices we've discussed.

Seems like the best of both worlds. If we ever wanted to ditch the
whole token filter and use Bison's %glr mode, we'd have extra work to
do, but there doesn't seem to be a rush to do so anyway.

> There's still undone work here, though:
>
> * I did not touch psql. Probably your patch is fine for that.
>
> * I did not do more with ecpg than get it to compile, using the
> same hacks as in your v7. It still fails its regression tests,
> but now the reason is that what we've done in parser/parser.c
> needs to be transposed into the identical functionality in
> ecpg/preproc/parser.c. Or at least some kind of functionality
> there. A problem with this approach is that it presumes we can
> reduce a UIDENT sequence to a plain IDENT, but to do so we need
> assumptions about the target encoding, and I'm not sure that
> ecpg should make any such assumptions. Maybe ecpg should just
> reject all cases that produce non-ASCII identifiers? (Probably
> it could be made to do something smarter with more work, but
> it's not clear to me that it's worth the trouble.)

Hmm, I thought we only allowed Unicode escapes in the first place if
the server encoding was UTF-8. Or did you mean something else?

> If this seems like a reasonable approach to you, please fill in
> the missing psql and ecpg bits.

Will do.

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
v9-addendum-handle-uescapes-in-parser.patch	application/octet-stream	581 bytes

In response to

Re: benchmarking Flex practices at 2019-11-25 22:51:41 from Tom Lane

Responses

Re: benchmarking Flex practices at 2019-11-26 15:32:29 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ranier Vilela	2019-11-26 12:13:16	[PATCH] Remove twice assignment with var pageop (nbtree.c).
Previous Message	Etsuro Fujita	2019-11-26 11:35:33	Re: A problem about partitionwise join