From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Andreas Karlsson <andreas(at)proxel(dot)se>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com> |
Subject: | Re: Speed up ICU case conversion by using ucasemap_utf8To*() |
Date: | 2025-03-30 01:18:46 |
Message-ID: | CALDaNm32RbiXNSb66Ui5cY=TnxjeQ_-hCHqwTx3c89S1UT0YNQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, 30 Mar 2025 at 00:20, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2025-03-17 12:16:11 +0530, vignesh C wrote:
> > On Fri, 20 Dec 2024 at 10:50, Andreas Karlsson <andreas(at)proxel(dot)se> wrote:
> > >
> > > Hi,
> > >
> > > Jeff pointed out to me that the case conversion functions in ICU have
> > > UTF-8 specific versions which means we can call those directly if the
> > > database encoding is UTF-8 and skip having to convert to and from UChar.
> > >
> > > Since most people today run their databases in UTF-8 I think this
> > > optimization is worth it and when measuring on short to medium length
> > > strings I got a 15-20% speed up. It is still slower than glibc in my
> > > benchmarks but the gap is smaller now.
> > >
> > > SELECT count(upper) FROM (SELECT upper(('Kålhuvud ' || i) COLLATE
> > > "sv-SE-x-icu") FROM generate_series(1, 1000000) i);
> > >
> > > master: ~540 ms
> > > Patched: ~460 ms
> > > glibc: ~410 ms
> > >
> > > I have also attached a clean up patch for the non-UTF-8 code paths. I
> > > thought about doing the same for the new UTF-8 code paths but it turned
> > > out to be a bit messy due to different function signatures for
> > > ucasemap_utf8ToUpper() and ucasemap_utf8ToLower() vs ucasemap_utf8ToTitle().
> >
> > I noticed that Jeff's comments from [1] have not yet been addressed, I
> > have changed the commitfest entry status to "Waiting on Author",
> > please address them and update it to "Needs Review".
> > [1] - https://www.postgresql.org/message-id/72c7c2b5848da44caddfe0f20f6c7ebc7c0c6e60.camel@j-davis.com
>
> It's also worth noting that this patch hasn't been building for quite a while
> (at least not since 2025-01-29):
>
> https://cirrus-ci.com/task/5621435164524544?logs=build#L1228
> [17:17:51.214] ld: error: undefined symbol: icu_convert_case
> [17:17:51.214] >>> referenced by pg_locale_icu.c:484 (../src/backend/utils/adt/pg_locale_icu.c:484)
> [17:17:51.214] >>> src/backend/postgres_lib.a.p/utils_adt_pg_locale_icu.c.o:(strfold_icu)
> [17:17:51.214] cc: error: linker command failed with exit code 1 (use -v to see invocation)
>
> I think we can mark this as returned-with-feedback for now?
Thanks, the commitfest entry is marked as returned with feedback.
@Andreas Karlsson Feel free to add a new commitfest entry when you
have addressed the feedback.
Regards,
Vignesh
From | Date | Subject | |
---|---|---|---|
Next Message | Sutou Kouhei | 2025-03-30 02:31:26 | Re: Make COPY format extendable: Extract COPY TO format implementations |
Previous Message | vignesh C | 2025-03-30 01:14:18 | Re: speedup COPY TO for partitioned table. |