Quick Links

Re: Optimization for lower(), upper(), casefold() functions.

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Alexander Borisov <lex(dot)borisov(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Optimization for lower(), upper(), casefold() functions.
Date:	2025-03-12 19:39:27
Message-ID:	2c10910c21b16cb9a0e5f67d80589e3acae2b6ef.camel@j-davis.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, 2025-03-12 at 19:55 +0300, Alexander Borisov wrote:
> 1. Added static for casemap() function. Otherwise the compiler could
> not
> optimize the code and the performance dropped significantly.

Oops, it was static, but I made it external just to see what code it
generated. I didn't intend to publish it as an external function --
thank you for catching that!

> 2. Added a fast path for codepoint < 0x80.
>
> v3j-0002:
> In the fast path for codepoints < 0x80, I added a premature return.
> This avoided additional insertions, which increased performance.

What do you mean "additional insertions"?

Also, should we just compute the results in the fast path? We don't
even need a table. Rough patch attached to go on top of v4-0001.

Should we properly return CASEMAP_SELF when *simple == u1, or is it ok
to return CASEMAP_SIMPLE? It probably doesn't matter performance-wise,
but it feels more correct to return CASEMAP_SELF.

>
> Perhaps for general
> beauty it should be made static inline, I don't have a rigid position
> here.

We ordinarily use "static inline" if it's in a header file, and
"static" if it's in a .c file, so I'll do it that way.

> I was purely based on existing approaches in Postgres, the
> Normalization Forms have them separated into different headers. Just
> trying to be consistent with existing approaches.

I think that was done for normalization primarily because it's not used
#ifndef FRONTEND (see unicode_norm.c), and perhaps also because it's
just a more complex function worthy of its own file.

I looked into the history, and commit 783f0cc64d explains why perfect
hashing is not used in the frontend:

"The decomposition table remains the same, getting used for the binary
search in the frontend code, where we care more about the size of the
libraries like libpq over performance..."

>
Regards,
Jeff Davis

Attachment	Content-Type	Size
vtmp-0001-fastpath.patch	text/x-patch	1.5 KB

In response to

Re: Optimization for lower(), upper(), casefold() functions. at 2025-03-12 16:55:31 from Alexander Borisov

Responses

Re: Optimization for lower(), upper(), casefold() functions. at 2025-03-12 20:39:13 from Alexander Borisov

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alena Rybakina	2025-03-12 19:41:33	Re: Vacuum statistics
Previous Message	Sami Imseih	2025-03-12 19:38:06	Re: making EXPLAIN extensible