Re: long analyze, libc bug and libicu

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: long analyze, libc bug and libicu
Date: 2018-07-07 14:35:30
Message-ID: 70603.1530974130@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Grigory Smolkin <g(dot)smolkin(at)postgrespro(dot)ru> writes:
> On 07/07/2018 10:10 AM, Peter Eisentraut wrote:
>> On 05.07.18 17:05, Grigory Smolkin wrote:
>>> Why ANALYZE igrones column COLLATE?

>> I think the statistics would be mostly the same independent of which
>> collation you use. This could possibly be refined, but I don't think
>> it's a major problem right now.

I don't actually believe that the stats would be mostly the same.
Yes, we ought to arrive at the same MCV list, ndistinct, etc, but the
histogram depends critically on the sort order. In particular its
endpoints, and estimates for comparison values near the endpoints,
might be very much different.

However, this is something that was left for future research when
we added collations, and nobody's really followed up on that.
Should ANALYZE/the planner care about collation (perhaps only for
specific stats types)? Does that go as far as ignoring stats that don't
match the query operator's collation? Should we consider recording stats
for more than one collation, and if so which ones? What are the
backwards-compatibility issues involved in changing something like this?

Grigory's proposal amounts to assuming that the column's assigned
collation is the only one of interest, which might be true but it
needs some defense. In any case it wouldn't end up being a three-line
patch; there's a whole lot of downstream work to consider.

But besides that, I've got no sympathy for forcing through a change
in this area just on the grounds that some platform's strcoll_l is
ridiculously slow with certain collations. The right answer for that
is to lobby the libc maintainers to fix strcoll_l, especially since
the odds of us changing this in released branches are nil.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2018-07-07 14:48:06 Re: BUG #15263: pg_dump / psql failure. When loading, psql does not see function-based constraints or indices
Previous Message Stephen Frost 2018-07-07 14:16:48 Re: long analyze, libc bug and libicu