From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Hannu Krosing <hannu(at)tm(dot)ee> |
Cc: | Thomas Lockhart <lockhart(at)alumni(dot)caltech(dot)edu>, pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level? |
Date: | 2000-02-20 17:41:44 |
Message-ID: | 5348.951068504@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hannu Krosing <hannu(at)tm(dot)ee> writes:
> Could you test with some other frontend (python, perl, tcl, C) ?
Yup, psql is untrustworthy as a means of testing the backend's comment
handling ;-).
I committed lexer changes on Friday evening that I believe fix all of
the backend's problems with \r versus \n. The issue with unterminated
-- comments, which was Hannu's original complaint, was fixed awhile ago;
but we still had problems with comments terminated with \r instead of
\n, as well as some non-SQL-compliant behavior for -- comments between
the segments of a multiline literal, etc etc.
While fixing this I realized that there are some fundamental
discrepancies between the way the backend recognizes comments and the
way that psql does. These arise from the fact that the comment
introducer sequences /* and -- are also legal as parts of operator
names, and since the backend is based on lex which uses greedy longest-
available-match rules, you get things like this:
select *-- 123
ERROR: Can't find left op '*--' for type 23
(Parsing '*--' as an operator name wins over parsing just '*' as an
operator name, so that '--' would be recognized on the next call.)
More subtly,
select /**/- 22
ERROR: parser: parse error at or near ""
which is the backend's rather lame excuse for an "unterminated comment"
error. What happens here is that the sequence /**/- is bit off as a
single lexer token, then tested in this order to see if it is
(a) a complete "/* ... */" comment (nope),
(b) the start of a comment, "/* anything" (yup), or
(c) an operator (which would succeed if it got the chance).
There does not seem to be any way to persuade lex to stop at the "*/"
if it has a chance to recognize a longer token by applying the operator
rule.
Both of these problems are easily avoided by inserting some whitespace,
but I wonder whether we ought to try to fix them for real. One way
that this could be done would be to alter the lexer rules so that
operators are lexed a single character at a time, which'd eliminate
lex's tendency to recognize a long operator name in place of a comment.
Then we'd need a post-pass to recombine adjacent operator characters into
a single token. (This would forever prevent anyone from using operator
names that include '--' or '/*', but I'm not sure that's a bad thing.)
The post-pass would also be a mighty convenient place to fix the NOT NULL
problem that's giving us trouble in another thread: the post-pass would
need one-token lookahead anyway, so it could very easily convert NOT
followed by NULL into a single special token.
Meanwhile, psql is using some ad-hoc code to recognize comments,
rather than a lexer, and it thinks both of these sequences are indeed
comments. I also find that it strips out the -- flavor of comment,
but sends the /* */ flavor on through, which is just plain inconsistent.
I suggest we change psql to not strip -- comments either. The only
reason for psql to be in the comment-recognition business at all is
so that it can determine whether a semicolon is end-of-query or just
a character in a comment.
Another thing I'd like to fix here is to get the backend to produce
a more useful error message than 'parse error at or near ""' when it's
presented with an unterminated comment or unterminated literal.
The flex manual recommends coding like
<quote><<EOF>> {
error( "unterminated quote" );
yyterminate();
}
but <<EOF>> is a flex-ism not supported by regular lex. We already
tell people they have to use flex (though I'm not sure that's *really*
necessary at present); do we want to set that requirement in stone?
Or does anyone know another way to get this effect?
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2000-02-20 21:50:51 | Re: [HACKERS] new backslash command of psql |
Previous Message | Tom Lane | 2000-02-20 16:34:38 | Re: [HACKERS] Re: SQL compliance |