From: | Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, John Naylor <jcnaylor(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) |
Date: | 2018-12-27 18:54:05 |
Message-ID: | ceb23715-78dd-81b0-4760-da00a8b08ffd@2ndQuadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 12/27/18 12:12 PM, Tom Lane wrote:
>> I don't buy that we're inable to write a descent parser that way.
> I do not think that we could write one for the current state of the
> PG grammar without an investment of effort so large that it's not
> going to happen. Even if such a parser were to spring fully armed
> from somebody's forehead, we absolutely cannot expect that it would
> continue to work correctly after non-wizard contributors modify it.
I just did a quick survey of generator tools. Unfortunately, the best
candidate alternative (ANTLR) no longer supports generating plain C
code. I don't know of another tool that is well maintained, supports C,
and generates top down parsers. Twenty-five years ago or so I wrote a
top-down table-driven parser generator, but that was in another country,
and besides, the wench is dead.
There are well known techniques (See s 4.4 of the Dragon Book, if you
have a copy) for formal analysis of grammars to determine predictive
parser action. They aren't hard, and the tables they produce are
typically much smaller than those used for LALR parsers. Still, probably
not for the faint of heart.
The tools that have moved to using hand cut RD parsers have done so
precisely because they get a significant performance benefit from doing so.
RD parsers are not terribly hard to write. Yes, the JSON grammar is
tiny, but I think I wrote the basics of the RD parser we use for JSON in
about an hour. I think arguing that our hacker base is not competent to
maintain such a thing for the SQL grammar is wrong. We successfully
maintain vastly more complex pieces of code.
Having said all that, I don't intend to spend any time on implementing
an alternative parser. It would as you say involve a heck of a lot of
time, which I don't have. It would be a fine academic research project
for some student.
A smaller project might be to see if we can replace the binary keyword
search in ScanKeyword with a perfect hashing function generated by
gperf, or something similar. I had a quick look at that, too.
Unfortunately the smallest hash table I could generate for our 440
symbols had 1815 entries, so I'm not sure how well that would work.
Worth investigating, though.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2018-12-27 18:57:04 | Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) |
Previous Message | Mitar | 2018-12-27 18:35:44 | Re: Feature: temporary materialized views |