From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Bob Jolliffe <bobjolliffe(at)gmail(dot)com> |
Cc: | Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
Subject: | Re: How can sort performance be so different |
Date: | 2019-02-20 22:25:01 |
Message-ID: | CAH2-Wz=t-Seb=vPx4yTTe0mNsF4xknxeu63s5s-He71pKiNAxA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Wed, Feb 20, 2019 at 1:42 PM Bob Jolliffe <bobjolliffe(at)gmail(dot)com> wrote:
> It seems not to be (completely) particular to the installation.
> Testing on different platforms we found variable speed difference
> between 100x and 1000x slower, but always a considerable order of
> magnitiude. The very slow performance comes from sorting Lao
> characters using en_US.UTF-8 collation.
I knew that some collations were slower, generally for reasons that
make some sense. For example, I was aware that ICU's use of Japanese
standard JIS X 4061 is particularly complicated and expensive, but
produces the most useful possible result from the point of view of a
Japanese speaker. Apparently glibc does not use that algorithm, and so
offers less useful sort order (though it may actually be faster in
that particular case).
I suspect that the reasons why the Lao locale sorts so much slower may
also have something to do with the intrinsic cost of supporting more
complicated rules. However, it's such a ridiculously large difference
that it also seems likely that somebody was disinclined to go to the
effort of optimizing it. The ICU people found that to be a tractable
goal, but they may have had to work at it. I also have a vague notion
that there are special cases that are more or less only useful for
sorting French. These complicate the implementation of UCA style
algorithms.
I am only speculating, based on what I've heard about other cases --
perhaps this explanation is totally wrong. I know a lot more about
this stuff than most people on this mailing list, but I'm still far
from being an expert.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Gunther | 2019-02-20 23:32:49 | neither CPU nor IO bound, but throttled performance |
Previous Message | Bob Jolliffe | 2019-02-20 21:42:15 | Re: How can sort performance be so different |