Quick Links

Re: speed up unicode normalization quick check

From:	Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To:	John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: speed up unicode normalization quick check
Date:	2020-05-28 21:59:52
Message-ID:	74AA9D41-8B59-41E6-941C-DA6B92A603DA@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> On May 21, 2020, at 12:12 AM, John Naylor <john(dot)naylor(at)2ndquadrant(dot)com> wrote:
>
> Hi,
>
> Attached is a patch to use perfect hashing to speed up Unicode
> normalization quick check.
>
> 0001 changes the set of multipliers attempted when generating the hash
> function. The set in HEAD works for the current set of NFC codepoints,
> but not for the other types. Also, the updated multipliers now all
> compile to shift-and-add on most platform/compiler combinations
> available on godbolt.org (earlier experiments found in [1]). The
> existing keyword lists are fine with the new set, and don't seem to be
> very picky in general. As a test, it also successfully finds a
> function for the OS "words" file, the "D" sets of codepoints, and for
> sets of the first n built-in OIDs, where n > 5.

Prior to this patch, src/tools/gen_keywordlist.pl is the only script that uses PerfectHash. Your patch adds a second. I'm not convinced that modifying the PerfectHash code directly each time a new caller needs different multipliers is the right way to go. Could you instead make them arguments such that gen_keywordlist.pl, generate-unicode_combining_table.pl, and future callers can pass in the numbers they want? Or is there some advantage to having it this way?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

speed up unicode normalization quick check at 2020-05-21 07:12:06 from John Naylor

Responses

Re: speed up unicode normalization quick check at 2020-05-29 03:54:39 from John Naylor

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2020-05-28 22:07:50	Re: Fix compilation failure against LLVM 11
Previous Message	Tom Lane	2020-05-28 21:43:44	Re: Conflict of implicit collations doesn't propagate out of subqueries