From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Some surprising precedence behavior in PG's grammar |
Date: | 2011-05-04 23:39:54 |
Message-ID: | 19623.1304552394@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
While looking at the grammar's operator-precedence declarations in
connection with a recent pgsql-docs question, it struck me that this
declaration is a foot-gun waiting to go off:
%nonassoc IS NULL_P TRUE_P FALSE_P UNKNOWN /* sets precedence for IS NULL, etc */
The only terminal that we actually need to attach precedence to for
IS NULL and related productions is "IS". The others are just listed
there to save attaching explicit %prec declarations to those productions.
This seems like a bad idea, because attaching a precedence to a terminal
symbol that doesn't absolutely have to have one is just asking for
trouble: it can cause bison to accept grammars that would better have
provoked a shift/reduce error, and to silently resolve the ambiguity in
a way that you maybe didn't expect.
So I thought to myself that it'd be better to remove the unnecessary
precedence markings, and tried, with the attached patch. And behold,
I got a shift/reduce conflict:
state 2788
1569 b_expr: b_expr qual_Op . b_expr
1571 | b_expr qual_Op .
NULL_P shift, and go to state 1010
NULL_P [reduce using rule 1571 (b_expr)]
So the god of unintended consequences has been here already. What this
is telling us is that in examples such as
CREATE TABLE foo (f1 int DEFAULT 10 %% NULL);
it is not immediately clear to bison whether to shift upon seeing the
NULL (which leads to a parse tree that says %% is an infix operator with
arguments 10 and NULL), or to reduce (which leads to a parse tree that
says that %% is a postfix operator with argument 10, and NULL is a
column declaration constraint separate from the DEFAULT expression).
If you try the experiment, you find out that the first interpretation is
preferred by the current grammar:
ERROR: operator does not exist: integer %% unknown
Now, this is probably a good thing, because NULL as a column declaration
constraint is not SQL standard (only NOT NULL is), so we're resolving
the ambiguity in a way that's more likely to be SQL-compatible. But it
could be surprising to somebody who expected the other behavior,
especially since this seemingly-closely-related command is parsed the
other way:
CREATE TABLE foo (f1 int DEFAULT 10 %% NOT NULL);
ERROR: operator does not exist: integer %%
And the reason for that difference in behavior is that NOT has a
declared precedence lower than POSTFIXOP, whereas NULL has a declared
precedence that's higher. That comparison determines how bison resolves
the shift/reduce conflict.
The fact that this behavior falls out of a precedence declaration that's
seemingly quite unrelated, and was surely not designed with this case in
mind, is a perfect example of why I say that precedence declarations are
hazardous.
So I'd still like to get rid of the precedence markings for TRUE_P,
FALSE_P, and UNKNOWN, and will do so unless somebody has a really good
reason not to. (It looks like we could avoid marking ZONE, too.) But
I would be happier if we could also not mark NULL, because that's surely
used in a lot of other places, and could easily bite us a lot harder
than this. Can anyone think of an alternative way to resolve this
particular conflict without the blunt instrument of a precedence marking?
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
de-precedent.patch | text/x-patch | 3.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Smith | 2011-05-04 23:45:43 | Re: Unlogged vs. In-Memory |
Previous Message | Andrew Dunstan | 2011-05-04 23:24:00 | Re: VARIANT / ANYTYPE datatype |