From: | Andreas Karlsson <andreas(at)proxel(dot)se> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Subject: | Speed up ICU case conversion by using ucasemap_utf8To*() |
Date: | 2024-12-20 05:20:38 |
Message-ID: | 167986ff-afcf-4542-94c6-61ee8474e138@proxel.se |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
Jeff pointed out to me that the case conversion functions in ICU have
UTF-8 specific versions which means we can call those directly if the
database encoding is UTF-8 and skip having to convert to and from UChar.
Since most people today run their databases in UTF-8 I think this
optimization is worth it and when measuring on short to medium length
strings I got a 15-20% speed up. It is still slower than glibc in my
benchmarks but the gap is smaller now.
SELECT count(upper) FROM (SELECT upper(('Kålhuvud ' || i) COLLATE
"sv-SE-x-icu") FROM generate_series(1, 1000000) i);
master: ~540 ms
Patched: ~460 ms
glibc: ~410 ms
I have also attached a clean up patch for the non-UTF-8 code paths. I
thought about doing the same for the new UTF-8 code paths but it turned
out to be a bit messy due to different function signatures for
ucasemap_utf8ToUpper() and ucasemap_utf8ToLower() vs ucasemap_utf8ToTitle().
Andreas
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Use-optimized-versions-of-ICU-case-conversion-for.patch | text/x-patch | 6.7 KB |
v1-0002-Reduce-code-duplication-in-ICU-case-mapping-code.patch | text/x-patch | 3.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Davis | 2024-12-20 05:23:20 | Re: Statistics Import and Export |
Previous Message | Amit Langote | 2024-12-20 04:23:35 | Eliminating SPI / SQL from some RI triggers - take 3 |