From: | Alexander Borisov <lex(dot)borisov(at)gmail(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Optimization for lower(), upper(), casefold() functions. |
Date: | 2025-03-12 20:39:13 |
Message-ID: | 44005c3d-88f4-4a26-981f-fd82dfa8e313@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
12.03.2025 22:39, Jeff Davis wrote:
[...]
>> 2. Added a fast path for codepoint < 0x80.
>>
>> v3j-0002:
>> In the fast path for codepoints < 0x80, I added a premature return.
>> This avoided additional insertions, which increased performance.
>
> What do you mean "additional insertions"?
Sorry for my English. I mean, we immediately do a return in the
if () condition. To avoid further branching/checking.
> Also, should we just compute the results in the fast path? We don't
> even need a table. Rough patch attached to go on top of v4-0001.
>
> Should we properly return CASEMAP_SELF when *simple == u1, or is it ok
> to return CASEMAP_SIMPLE? It probably doesn't matter performance-wise,
> but it feels more correct to return CASEMAP_SELF.
It seems to disrupt the overall "beauty" of the approach. Thus, we will
copy code (bloat code), make optimizations that do not improve
performance but bloat code. I would refrain from such practices.
Especially since we'll be changing all that in the next patch (v4-0002).
>>
>> Perhaps for general
>> beauty it should be made static inline, I don't have a rigid position
>> here.
>
> We ordinarily use "static inline" if it's in a header file, and
> "static" if it's in a .c file, so I'll do it that way.
Great, I've changed this place. Performance has not changed in any way.
>> I was purely based on existing approaches in Postgres, the
>> Normalization Forms have them separated into different headers. Just
>> trying to be consistent with existing approaches.
>
> I think that was done for normalization primarily because it's not used
> #ifndef FRONTEND (see unicode_norm.c), and perhaps also because it's
> just a more complex function worthy of its own file.
>
> I looked into the history, and commit 783f0cc64d explains why perfect
> hashing is not used in the frontend:
>
> "The decomposition table remains the same, getting used for the binary
> search in the frontend code, where we care more about the size of the
> libraries like libpq over performance..."
I removed the extra file (unicode_case_func.h). You are right, we should
not create unnecessary clutter.
v5 attached.
Regards,
Alexander Borisov
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Refactor-convert_case-to-prepare-for-optimization.patch | text/plain | 6.1 KB |
v5-0002-Optimization-for-lower-upper-casefold-functions.patch | text/plain | 701.9 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2025-03-12 20:50:44 | Re: Adding skip scan (including MDAM style range skip scan) to nbtree |
Previous Message | Alena Rybakina | 2025-03-12 20:28:34 | Re: Adding skip scan (including MDAM style range skip scan) to nbtree |