| From: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> | 
|---|---|
| To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Cc: | Daniel Verite <daniel(at)manitou-mail(dot)org>, Andreas Karlsson <andreas(at)proxel(dot)se> | 
| Subject: | Re: Unicode normalization SQL functions | 
| Date: | 2020-03-26 07:25:46 | 
| Message-ID: | 7052cc8f-0164-72a8-d9a4-fd32066c938e@2ndquadrant.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 2020-03-24 10:20, Peter Eisentraut wrote:
> Now I have some concerns about the size of the new table in
> unicode_normprops_table.h, and the resulting binary size.  At the very
> least, we should probably make that #ifndef FRONTEND or something like
> that so libpq isn't bloated by it unnecessarily.  Perhaps there is a
> better format for that table?  Any ideas?
I have figured this out. New patch is attached.
First, I have added #ifndef FRONTEND, as mentioned above, so libpq isn't 
bloated.  Second, I have changed the lookup structure to a bitfield, so 
each entry is only 32 bits instead of 64.  Third, I have dropped the 
quickcheck tables for the NFD and NFKD forms.  Those are by far the 
biggest tables, and you still get okay performance if you do the 
normalization check the long way, since we don't need the recomposition 
step on those cases, which is by far the slowest part.  The main use 
case of all of this, I expect, is to check for NFC normalization, so 
it's okay if the other variants are not optimized to the same extent.
-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
| Attachment | Content-Type | Size | 
|---|---|---|
| v4-0001-Add-SQL-functions-for-Unicode-normalization.patch | text/plain | 224.6 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2020-03-26 07:34:58 | Re: [Patch] pg_rewind: options to use restore_command from recovery.conf or command line | 
| Previous Message | Surafel Temesgen | 2020-03-26 07:22:26 | Re: A rather hackish POC for alternative implementation of WITH TIES |