Quick Links

Speed up ICU case conversion by using ucasemap_utf8To*()

From:	Andreas Karlsson <andreas(at)proxel(dot)se>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Speed up ICU case conversion by using ucasemap_utf8To*()
Date:	2024-12-20 05:20:38
Message-ID:	167986ff-afcf-4542-94c6-61ee8474e138@proxel.se
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Jeff pointed out to me that the case conversion functions in ICU have
UTF-8 specific versions which means we can call those directly if the
database encoding is UTF-8 and skip having to convert to and from UChar.

Since most people today run their databases in UTF-8 I think this
optimization is worth it and when measuring on short to medium length
strings I got a 15-20% speed up. It is still slower than glibc in my
benchmarks but the gap is smaller now.

SELECT count(upper) FROM (SELECT upper(('Kålhuvud ' || i) COLLATE
"sv-SE-x-icu") FROM generate_series(1, 1000000) i);

master: ~540 ms
Patched: ~460 ms
glibc: ~410 ms

I have also attached a clean up patch for the non-UTF-8 code paths. I
thought about doing the same for the new UTF-8 code paths but it turned
out to be a bit messy due to different function signatures for
ucasemap_utf8ToUpper() and ucasemap_utf8ToLower() vs ucasemap_utf8ToTitle().

Andreas

Attachment	Content-Type	Size
v1-0001-Use-optimized-versions-of-ICU-case-conversion-for.patch	text/x-patch	6.7 KB
v1-0002-Reduce-code-duplication-in-ICU-case-mapping-code.patch	text/x-patch	3.9 KB

Responses

Re: Speed up ICU case conversion by using ucasemap_utf8To*() at 2024-12-20 19:24:04 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jeff Davis	2024-12-20 05:23:20	Re: Statistics Import and Export
Previous Message	Amit Langote	2024-12-20 04:23:35	Eliminating SPI / SQL from some RI triggers - take 3