Re: Speed up ICU case conversion by using ucasemap_utf8To*()

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up ICU case conversion by using ucasemap_utf8To*()
Date: 2024-12-20 19:24:04
Message-ID: 72c7c2b5848da44caddfe0f20f6c7ebc7c0c6e60.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2024-12-20 at 06:20 +0100, Andreas Karlsson wrote:
> SELECT count(upper) FROM (SELECT upper(('Kålhuvud ' || i) COLLATE
> "sv-SE-x-icu") FROM generate_series(1, 1000000) i);
>
> master:  ~540 ms
> Patched: ~460 ms
> glibc:   ~410 ms

It looks like you are opening and closing the UCaseMap object each
time. Why not save it in pg_locale_t? That should speed it up even more
and hopefully beat libc.

Also, to support older ICU versions consistently, we need to fix up the
locale name to support "und"; cf. pg_ucol_open(). Perhaps factor out
that logic?

Regards,
Jeff Davis

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-12-20 19:46:20 Re: allow changing autovacuum_max_workers without restarting
Previous Message Masahiko Sawada 2024-12-20 19:23:30 Re: Memory leak in WAL sender with pgoutput (v10~)