Re: How can sort performance be so different

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Bob Jolliffe <bobjolliffe(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: How can sort performance be so different
Date: 2019-01-30 23:54:08
Message-ID: CAH2-WznhLA6TbVQ4Mc5aNUp7hBxbAn6__Mtf7oPeTn7FeOSfmw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Jan 30, 2019 at 3:57 AM Bob Jolliffe <bobjolliffe(at)gmail(dot)com> wrote:
> (i) whether the sort order makes sense for the Laos names; and
> (ii) what the locale settings are on the production server where the
> problem was first reported.
>
> There will be some turnaround before I get this information. I am
> guessing that the database is using "en_US" rather than anything Laos
> specific. In which case "C" would probably be no worse re sort order.
> But will know better soon.
>
> This has been a long but very fruitful investigation. Thank you all for input.

If you can find a way to use an ICU collation, it may be possible to
get Laotian sort order with performance that's a lot closer to the
performance you see with the C locale. The difference that you're
seeing is obviously explainable in large part by the C locale using
the abbreviated keys technique. The system glibc's collations cannot
use this optimization.

I believe that some locales have inherently more expensive
normalization processes (comparisons) than others, but if you can
effective amortize the cost per key by building an abbreviated key, it
may not matter that much. And, ICU may be faster than glibc anyway.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Nandakumar M 2019-01-31 07:30:36 Setting effective_cache size
Previous Message Pavel Stehule 2019-01-30 16:34:24 Re: [HACKERS] proposal: schema variables