Quick Links

Slow performance of collate "en_US.utf8"

From:	Alexey Borschev <a(dot)borschev(at)postgrespro(dot)ru>
To:	pgsql-performance(at)lists(dot)postgresql(dot)org
Subject:	Slow performance of collate "en_US.utf8"
Date:	2025-03-03 08:47:07
Message-ID:	0bb22133-7c65-42f0-9853-0b65713f85b0@postgrespro.ru
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Hi!

Thank everyone for Your answers!

It is now clear, that it is not PG issue and it will not be fixed
anytime soon.

I see that with pure numbers sorting en_US.utf8 is still well behind:

explain (analyze, costs, buffers, verbose)

select gen.id::text collate "C"

from generate_series(10000, 20000) AS gen(id)

order by 1 desc;

-- 3.5 ms

explain (analyze, costs, buffers, verbose)

select gen.id::text collate "en_US.utf8"

from generate_series(10000, 20000) AS gen(id)

order by 1 desc;

-- 19.8 ms

On the other hand, when I add limit 1, the difference become much less
for the reasons I do not understand:

explain (analyze, costs, buffers, verbose)

select gen.id::text collate "C"

from generate_series(10000, 20000) AS gen(id)

order by 1 desc

limit 1;

-- 1.82 ms

explain (analyze, costs, buffers, verbose)

select gen.id::text collate "en_US.utf8"

from generate_series(10000, 20000) AS gen(id)

order by 1 desc

limit 1;

-- 2.8 ms

In fact, I've got no database issues right now - just benchmarking
search speed of b-tree indexes on different columns types - int4, int8,
numeric, texts, uuids, and run into this corner case.

l hope to make a talk about this on one of the PG conferences some day.

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Abraham, Danny	2025-03-06 07:39:17	Asking for OK for a nasty trick to resolve PG CVE-2025-1094 i
Previous Message	Pavel Stehule	2025-03-01 07:23:24	Re: Re: proposal: schema variables