pgsql: Split psql's lexer into two separate .l files for SQL and backsl

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Split psql's lexer into two separate .l files for SQL and backsl
Date: 2016-03-19 04:25:07
Message-ID: E1ah8Rn-0008MS-Gi@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Split psql's lexer into two separate .l files for SQL and backslash cases.

This gets us to a point where psqlscan.l can be used by other frontend
programs for the same purpose psql uses it for, ie to detect when it's
collected a complete SQL command from input that is divided across
line boundaries. Moreover, other programs can supply their own lexers
for backslash commands of their own choosing. A follow-on patch will
use this in pgbench.

The end result here is roughly the same as in Kyotaro Horiguchi's
0001-Make-SQL-parser-part-of-psqlscan-independent-from-ps.patch, although
the details of the method for switching between lexers are quite different.
Basically, in this patch we share the entire PsqlScanState, YY_BUFFER_STATE
stack, *and* yyscan_t between different lexers. The only thing we need
to do to switch to a different lexer is to make sure the start_state is
valid for the new lexer. This works because flex doesn't keep any other
persistent state that depends on the specific lexing tables generated for
a particular .l file. (We are assuming that both lexers are built with
the same flex version, or at least versions that are compatible with
respect to the contents of yyscan_t; but that doesn't seem likely to
be a big problem in practice, considering how slowly flex changes.)

Aside from being more efficient than Horiguchi-san's original solution,
this avoids possible corner-case changes in semantics: the original code
was capable of popping the input buffer stack while still staying in
backslash-related parsing states. I'm not sure that that equates to any
useful user-visible behaviors, but I'm not sure it doesn't either, so
I'm loath to assume that we only need to consider the topmost buffer when
parsing a backslash command.

I've attempted to update the MSVC build scripts for the added .l file,
but will rely on the buildfarm to see if I missed anything.

Kyotaro Horiguchi and Tom Lane

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/0ea9efbe9ec1bf07cc6ae070bdd54700af08e44d

Modified Files
--------------
src/bin/psql/.gitignore | 1 +
src/bin/psql/Makefile | 15 +-
src/bin/psql/command.c | 2 +-
src/bin/psql/nls.mk | 3 +-
src/bin/psql/psqlscan.h | 22 +-
src/bin/psql/psqlscan.l | 813 +++++--------------------------------------
src/bin/psql/psqlscan_int.h | 129 +++++++
src/bin/psql/psqlscanslash.h | 35 ++
src/bin/psql/psqlscanslash.l | 735 ++++++++++++++++++++++++++++++++++++++
src/bin/psql/variables.c | 2 +-
src/tools/msvc/Mkvcbuild.pm | 2 +-
src/tools/msvc/clean.bat | 1 +
12 files changed, 1005 insertions(+), 755 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2016-03-19 04:43:51 pgsql: Suppress FLEX_NO_BACKUP check for psqlscanslash.l.
Previous Message Michael Paquier 2016-03-19 01:43:32 Re: pgsql: Convert psql's flex lexer to be re-entrant, and make it compile