pgsql: Improve performance of Unicode {de,re}composition in the backend

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: pgsql-committers(at)lists(dot)postgresql(dot)org
Subject: pgsql: Improve performance of Unicode {de,re}composition in the backend
Date: 2020-10-23 02:09:42
Message-ID: E1kVmWU-0005ls-SS@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Improve performance of Unicode {de,re}composition in the backend

This replaces the existing binary search with two perfect hash functions
for the composition and the decomposition in the backend code, at the
cost of slightly-larger binaries there (35kB in libpgcommon_srv.a). Per
the measurements done, this improves the speed of the recomposition and
decomposition by up to 30~40 times for the NFC and NFKC conversions,
while all other operations get at least 40% faster. This is not as
"good" as what libicu has, but it closes the gap a lot as per the
feedback from Daniel Verite.

The decomposition table remains the same, getting used for the binary
search in the frontend code, where we care more about the size of the
libraries like libpq over performance as this gets involved only in code
paths related to the SCRAM authentication. In consequence, note that
the perfect hash function for the recomposition needs to use a new
inverse lookup array back to to the existing decomposition table.

The size of all frontend deliverables remains unchanged, even with
--enable-debug, including libpq.

Author: John Naylor
Reviewed-by: Michael Paquier, Tom Lane
Discussion: https://postgr.es/m/CAFBsxsHUuMFCt6-pU+oG-F1==CmEp8wR+O+bRouXWu6i8kXuqA@mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/783f0cc64dcc05e3d112a06b1cd181e5a1ca9099

Modified Files
--------------
src/common/unicode/Makefile | 4 +-
src/common/unicode/generate-unicode_norm_table.pl | 226 +-
src/common/unicode_norm.c | 106 +-
src/include/common/unicode_norm_hashfunc.h | 2932 +++++++++++++++++++++
src/tools/pgindent/exclude_file_patterns | 3 +-
5 files changed, 3227 insertions(+), 44 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Heikki Linnakangas 2020-10-23 06:38:39 pgsql: Fix initialization of es_result_relations in EvalPlanQualStart()
Previous Message Tom Lane 2020-10-23 01:25:03 pgsql: Update time zone data files to tzdata release 2020d.