Re: Optimization for lower(), upper(), casefold() functions.

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Borisov <lex(dot)borisov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Optimization for lower(), upper(), casefold() functions.
Date: 2025-02-18 22:02:25
Message-ID: d0dd50662da84f582c34247f4ed43061e8d86d34.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2025-02-11 at 23:08 +0300, Alexander Borisov wrote:
> I tried the approach via a range table. The result was worse than
> without the table. With branching in a function, the result is
> better.
>
> Patch v3 — ranges binary search by branches.
> Patch v4 — ranges binary search by table.

Thoughts on v3:

It looks like the top 5 bits of the offset are unused. What if we used
those bits for flags to indicate:

HAS_LOWER
HAS_UPPER
HAS_FOLD
HAS_SPECIAL
HAS_TITLE

That way, we only need to look in the corresponding table if it
actually has an entry other than the codepoint itself.

It doesn't leave a lot of room if the tables get larger, but if we are
worried about that, we could eliminate HAS_TITLE, because I don't think
the performance for INITCAP() is as important as LOWER/UPPER/CASEFOLD.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-02-18 22:24:06 Re: BUG #18815: Logical replication worker Segmentation fault
Previous Message Masahiko Sawada 2025-02-18 21:23:20 Re: pg_trgm comparison bug on cross-architecture replication due to different char implementation