From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Kevin Grittner <kgrittn(at)mail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Parser Cruft in gram.y |
Date: | 2012-12-18 02:00:37 |
Message-ID: | CA+TgmoYYs++fs0WkVHWXXq=7Ynj94VviDPUorE2=EGFCuz7uQg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Dec 15, 2012 at 11:52 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <kgrittn(at)mail(dot)com> writes:
>> Tom Lane wrote:
>>> the parser tables are basically number-of-tokens wide by
>>> number-of-states high. (In HEAD there are 433 tokens known to the
>>> grammar, all but 30 of which are keywords, and 4367 states.)
>>>
>>> Splitting the grammar into multiple grammars is unlikely to do
>>> much to improve this --- in fact, it could easily make matters
>>> worse due to duplication.
>
>> Of course if they were both at 80% it would be a higher total than
>> combined, but unless you have a handle on the percentages, it
>> doesn't seem like a foregone conclusion. Do you have any feel for
>> what the split would be?
>
> I don't really, but I will note that the scalar-expression subgrammar is
> a pretty sizable part of the whole, and it's difficult to see how you'd
> make a useful split that didn't duplicate it. I guess you could push
> CREATE TABLE, ALTER TABLE, CREATE DOMAIN, ALTER DOMAIN, COPY, and
> anything else that included expression arguments over into the "main"
> grammar. But that path leads to more and more stuff getting moved to
> the "main" grammar over time, making the whole thing more and more
> questionable. The whole concept seems ugly and unmaintainable in any
> case.
I thought a little bit about the sort of thing that Dimitri is
proposing in the past, and it seemed to me that one could put DML in
one grammar and everything else in another grammar and then decide,
based on the first word of the input, which grammar to use. But there
are a couple of problems with this. First, the DML grammar has to
include an awful lot of stuff, because the grammar for expressions is
really complicated and involves a lot of things like special-case
syntax for XML that are probably not really carrying their weight but
which cannot easily be factored out. Second, the DDL grammar would
have to duplicate a lot of stuff that also shows up in the DML
grammar, because things like expressions can also show up in DEFAULT
or USING clauses which show up in things like CREATE TABLE and ALTER
TABLE and CREATE SCHEMA .. CREATE TABLE.
Now either one of these problems by itself might not be sufficient to
kill the idea: if the DML grammar were a small subset of the full
grammar, one might not mind duplicating some stuff, on the grounds
that in most cases that full grammar would not be used, and using only
the smaller tables most of the time would be easier on the L1 cache.
And on the other hand, if you could get a clean split between the two
grammars, then regardless of exactly what the split was, it might seem
a win. But it seemed to me when I looked at this that you'd have to
duplicate a lot of stuff and the small parser still wouldn't end up
being very small, which I found hard to get excited about.
I'm frankly kind of shocked at the revelation that the parser is
already 14% of the backend. I knew it was big; I didn't realize it
was THAT big.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2012-12-18 02:10:23 | Re: [ADMIN] Problems with enums after pg_upgrade |
Previous Message | Bruce Momjian | 2012-12-18 01:41:37 | Re: [ADMIN] Problems with enums after pg_upgrade |