Re: Optimization for lower(), upper(), casefold() functions.

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Borisov <lex(dot)borisov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Optimization for lower(), upper(), casefold() functions.
Date: 2025-03-15 20:07:44
Message-ID: 177451ea791a1f90039ea3047f1f890b1f9ae1db.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 2025-03-14 at 15:00 +0300, Alexander Borisov wrote:
> I tried adding a loop to create tables, and everything looks fine
> (v7).
> Also removed unnecessary (hanging) global variables.

Changed. I used a loop more similar to your first one (hash of arrays),
and I left case_map_special outside of the loop. It a different type of
array: rather than mapping to a codepoint, it maps to another index,
and I think it's a different case (and needs a different comment).

>
> It's all about not getting too carried away. In my vision these
> subroutines (user-defined functions) will have to be moved to perl
> module (.pm) in the future, after the patch is committed, so that the
> code can be used for Normalization Forms.

I prefer to generalize when we have the other code in place. As it was,
it was a bit confusing why the extra arguments and subroutines were
there.

Also, make_ranges() doesn't seem to do what is described in the
comment: it produces a single range. I changed "> $limit" to ">=
$limit", but then it generates 3 ranges, not two. I rewrote it to be
more clear what's going on:

* I used new looping logic that, at least to me, seems a bit simpler.

* I changed the ranges from an array of 4 elements to a hash with 3
keys, which is more descriptive when using it.

* I changed the terminology of a range so that it's Start and End,
because $from and $to are used by branches() to mean something else.

In branch():

* I got rid of the third element "VALUE" from the range. It was used to
be the start of the next range, but there was already code in the
caller to look ahead to the next range, so it didn't serve any purpose.

* If we rely on some compiler optimizations, it might be possible to
simplify branch() a bit, but I'm fine with the way it's done.

When looking around, I didn't find a lot of material discussing this
generated-branches approach. While it's mentioned a few places, I
couldn't even find a consistent name for it. If you know of something
where I can read some more analysis, please send a link.

Committed. Thank you!

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-03-15 20:11:26 Re: Optimization for lower(), upper(), casefold() functions.
Previous Message Dean 2025-03-15 19:28:50 Proposal: Deferred Replica Filtering for PostgreSQL Logical Replication