| From: | Johannes Graën <johannes(at)selfnet(dot)de> | 
|---|---|
| To: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Degression (PG10 > 11, 12 or 13) | 
| Date: | 2021-05-28 23:19:52 | 
| Message-ID: | 0ff0701d-48fb-39f5-0760-443cc452daad@selfnet.de | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On 28/05/2021 18.24, Tom Lane wrote:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> pá 28. 5. 2021 v 16:12 odesílatel Johannes Graën <johannes(at)selfnet(dot)de>
>> napsal:
>>> When trying to upgrade an existing database from version 10 to 13 I came
>>> across a degression in some existing code used by clients. Further
>>> investigations showed that performance measures are similar in versions
>>> 11 to 13, while in the original database on version 10 it's around 100
>>> times faster. I could boil it down to perl functions used for sorting.
> 
>> Are you sure, so all databases use the same encoding and same locale?
> 
> Yeah ... I don't know too much about the performance of Perl regexps,
> but it'd be plausible that it varies depending on locale setting.
It probably wasn't Perl at all. Thanks to the hint I checked the initial
database again and, while encoding and ctype are set to UTF8, the
collation is C, which makes a huge difference:
... order by tab(attr) => Execution Time: 51429.875 ms
... order by tab(attr collate "C") => Execution Time: 537.757 ms
in the original database. Any other version yields similar times.
On 28/05/2021 17.47, Tomas Vondra wrote:
> That function is pretty much just a sequence of ~120 regular
> expressions, doing something similar to unaccent(). I wonder if we're
> calling the function much more often, perhaps due to some changes in the
> sort code (the function is immutable, but that does not guarantee it's
> called just once).
> Also, maybe try materializing the function results before doing the
> sort, perhaps like this:
>
> SELECT * FROM (select attr, func(attr) as fattr from tab offset 0) foo
> ORDER BY fattr;
I was expecting it to be called once in the process of sorting, and it
seems that this is actually true for all version and different
collations, but sorting for a collation that is not C requires
considerable more resources (that still needs to be shown for other
collations, but I see the overhead of having more or less complex
definitions vs. just comparing numbers).
That being said, I would have used unaccent or, if that wasn't an
option, maybe have those values calculated by a trigger function when
the corresponding rows are changed. But I don't control the code.
Now what keeps me wondering is how the sorting works internally and if
we could conclude that using the C collation in order expressions and
indexes is a general way to speed up queries - if the actual order is of
less importance.
Best
  Johannes
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Peter Geoghegan | 2021-05-29 00:27:08 | ANALYZE's dead tuple accounting can get confused | 
| Previous Message | Dean Gibson (DB Administrator) | 2021-05-28 22:13:58 | Re: AWS forcing PG upgrade from v9.6 a disaster |