From: | John Naylor <john(dot)naylor(at)enterprisedb(dot)com> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: speed up unicode decomposition and recomposition |
Date: | 2020-10-15 02:56:41 |
Message-ID: | CAFBsxsGndjXKUOzwzK0C2aaCwqnp01Nw3YrpP-x3PPJrfCB+8Q@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 14, 2020 at 8:25 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> On Wed, Oct 14, 2020 at 01:06:40PM -0400, Tom Lane wrote:
> > IIUC, the only place libpq uses this is to process a password-sized
> string
> > or two during connection establishment. It seems quite silly to add
> > 26kB in order to make that faster. Seems like a nice speedup on the
> > backend side, but I'd vote for keeping the frontend as-is.
>
> Agreed. Let's only use the perfect hash in the backend. It would be
> nice to avoid an extra generation of the decomposition table for that,
> and a table ordered by codepoints is easier to look at. How much do
> you think would be the performance impact if we don't use for the
> linear search the most-optimized decomposition table?
>
With those points in mind and thinking more broadly, I'd like to try harder
on recomposition. Even several times faster, recomposition is still orders
of magnitude slower than ICU, as measured by Daniel Verite [1]. I only did
it this way because I couldn't think of how to do the inverse lookup with a
hash. But I think if we constructed the hash key like
pg_hton64((code1 << 32) | code2)
and on the Perl side do something like
pack('N',$code1) . pack('N',$code2)
that might work. Or something that looks more like the C side. And make
sure to use the lowest codepoint for the result. That way, we can still
keep the large decomp array ordered, making it easier to keep the current
implementation in the front end, and hopefully getting even better
performance in the backend.
[1]
https://www.postgresql.org/message-id/2c5e8df9-43b8-41fa-88e6-286e8634f00a%40manitou-mail.org
--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2020-10-15 03:06:28 | Re: speed up unicode decomposition and recomposition |
Previous Message | Kyotaro Horiguchi | 2020-10-15 02:44:43 | Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump. |