From: | Alexey Borschev <a(dot)borschev(at)postgrespro(dot)ru> |
---|---|
To: | pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Slow performance of collate "en_US.utf8" |
Date: | 2025-03-03 08:47:07 |
Message-ID: | 0bb22133-7c65-42f0-9853-0b65713f85b0@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hi!
Thank everyone for Your answers!
It is now clear, that it is not PG issue and it will not be fixed
anytime soon.
I see that with pure numbers sorting en_US.utf8 is still well behind:
explain (analyze, costs, buffers, verbose)
select gen.id::text collate "C"
from generate_series(10000, 20000) AS gen(id)
order by 1 desc;
-- 3.5 ms
explain (analyze, costs, buffers, verbose)
select gen.id::text collate "en_US.utf8"
from generate_series(10000, 20000) AS gen(id)
order by 1 desc;
-- 19.8 ms
On the other hand, when I add limit 1, the difference become much less
for the reasons I do not understand:
explain (analyze, costs, buffers, verbose)
select gen.id::text collate "C"
from generate_series(10000, 20000) AS gen(id)
order by 1 desc
limit 1;
-- 1.82 ms
explain (analyze, costs, buffers, verbose)
select gen.id::text collate "en_US.utf8"
from generate_series(10000, 20000) AS gen(id)
order by 1 desc
limit 1;
-- 2.8 ms
In fact, I've got no database issues right now - just benchmarking
search speed of b-tree indexes on different columns types - int4, int8,
numeric, texts, uuids, and run into this corner case.
l hope to make a talk about this on one of the PG conferences some day.
From | Date | Subject | |
---|---|---|---|
Next Message | Abraham, Danny | 2025-03-06 07:39:17 | Asking for OK for a nasty trick to resolve PG CVE-2025-1094 i |
Previous Message | Pavel Stehule | 2025-03-01 07:23:24 | Re: Re: proposal: schema variables |