Quick Links

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

From:	John Naylor <jcnaylor(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)
Date:	2018-12-27 16:19:51
Message-ID:	CAJVSVGUUOTd2_Rv46FTcy9Yp6+xMwo0BwdSvPWCaLia_tesQjA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 12/26/18, John Naylor <jcnaylor(at)gmail(dot)com> wrote:
> On 12/26/18, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I wonder if we could do something really simple like a lookup based on
>> the first character of the scan keyword. It looks to me like there are
>> 440 keywords right now, and the most common starting letter is 'c',
>> which is the first letter of 51 keywords. So dispatching based on the
>> first letter clips at least 3 steps off the binary search. I don't
>> know whether that's enough to be worthwhile, but it's probably pretty
>> simple to implement.

> I agree it'd be fairly simple to do, and might raise the bar for doing
> anything more complex.

I went ahead and did this for v4, but split out into a separate patch.
In addition, I used a heuristic to bypass binary search for the most
common keywords. Normally, the middle value is computed
mathematically, but I found that in each range of keywords beginning
with the same letter, there is often 1 or 2 common keywords that are
good first guesses, such as select, from, join, limit, where. I taught
the lookup to try those first, and then compute subsequent steps the
usual way.

Barring additional bikeshedding on 0001, I'll plan on implementing
offset-based lookup for the other keyword types and retire the old
ScanKeyword. Once that's done, we can benchmark and compare with the
optimizations in 0002.

-John Naylor

Attachment	Content-Type	Size
v4-0001-WIP-Use-offset-based-keyword-lookup.patch	text/x-patch	22.8 KB
v4-0002-Dispatch-keyword-lookup-on-the-first-character.patch	text/x-patch	6.8 KB

In response to

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) at 2018-12-26 19:45:47 from John Naylor

Responses

Re: reducing the footprint of ScanKeyword (was Re: Large writable variables) at 2018-12-27 18:57:04 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Hugh Ranalli	2018-12-27 16:21:52	Re: BUG #15548: Unaccent does not remove combining diacritical characters
Previous Message	Magnus Hagander	2018-12-27 14:55:58	Re: Offline enabling/disabling of data checksums