Re: How can sort performance be so different

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Bob Jolliffe <bobjolliffe(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: How can sort performance be so different
Date: 2019-02-20 22:25:01
Message-ID: CAH2-Wz=t-Seb=vPx4yTTe0mNsF4xknxeu63s5s-He71pKiNAxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Feb 20, 2019 at 1:42 PM Bob Jolliffe <bobjolliffe(at)gmail(dot)com> wrote:
> It seems not to be (completely) particular to the installation.
> Testing on different platforms we found variable speed difference
> between 100x and 1000x slower, but always a considerable order of
> magnitiude. The very slow performance comes from sorting Lao
> characters using en_US.UTF-8 collation.

I knew that some collations were slower, generally for reasons that
make some sense. For example, I was aware that ICU's use of Japanese
standard JIS X 4061 is particularly complicated and expensive, but
produces the most useful possible result from the point of view of a
Japanese speaker. Apparently glibc does not use that algorithm, and so
offers less useful sort order (though it may actually be faster in
that particular case).

I suspect that the reasons why the Lao locale sorts so much slower may
also have something to do with the intrinsic cost of supporting more
complicated rules. However, it's such a ridiculously large difference
that it also seems likely that somebody was disinclined to go to the
effort of optimizing it. The ICU people found that to be a tractable
goal, but they may have had to work at it. I also have a vague notion
that there are special cases that are more or less only useful for
sorting French. These complicate the implementation of UCA style
algorithms.

I am only speculating, based on what I've heard about other cases --
perhaps this explanation is totally wrong. I know a lot more about
this stuff than most people on this mailing list, but I'm still far
from being an expert.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Gunther 2019-02-20 23:32:49 neither CPU nor IO bound, but throttled performance
Previous Message Bob Jolliffe 2019-02-20 21:42:15 Re: How can sort performance be so different