Re: Incremental Sort Cost Estimation Instability

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, tomas(at)vondra(dot)me
Subject: Re: Incremental Sort Cost Estimation Instability
Date: 2024-11-07 05:57:46
Message-ID: b5d15895-c9b9-4521-90dd-c862c732196c@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/8/24 11:33, Andrei Lepikhov wrote:
> On 9/23/24 20:02, Andrei Lepikhov wrote:
>> On 12/9/2024 12:12, David Rowley wrote:
>>> On Thu, 12 Sept 2024 at 21:51, Andrei Lepikhov <lepihov(at)gmail(dot)com>
> Minor change to make compiler and cfbot happy
Now, this thread looks connected to the [1]. However, it still has
independent profit, which can be discussed separately.
After the introduction of the em->em_ndistinct cache, I played around
with the idea of letting the estimate_num_groups use this cache.
Occasionally found out that we have one more instability case like the
following:

DROP TABLE IF EXISTS test;
CREATE TABLE test (x int, y int, z int);
INSERT INTO test (x,y,z) (SELECT random()*1E5, random()*2, random() FROM
generate_series(1,1e4));
VACUUM ANALYZE test;

EXPLAIN SELECT count(*) FROM test WHERE x=y GROUP BY x,y;
EXPLAIN SELECT count(*) FROM test WHERE x=y GROUP BY y,x;

Here, you can see that depending on the initial order of grouping,
Postgres chooses different columns for grouping. Doing that causes
different estimations - one of them is definitely wrong:

GroupAggregate (cost=181.41..182.29 rows=50 width=16)
GroupAggregate (cost=181.41..181.82 rows=3 width=16)

That happens because when estimating the number of groups, Postgres
doesn't consider EquivalenceClass, which can let him correct group
estimation at a low price.
It may be done inside the make_pathkeys_for_sortclauses_extended by
choosing a column with a lower number of distinct, but IMO, it is better
to do it at the moment of the number of groups estimation.

Thoughts? Is it a real issue or just a non-practical corner case?

The new version of the patch is attached.

[1]
https://www.postgresql.org/message-id/flat/8742aaa8-9519-4a1f-91bd-364aec65f5cf%40gmail.com
--
regards, Andrei Lepikhov

Attachment Content-Type Size
v4-0001-Stabilise-incremental-sort-cost-calculation.patch text/x-patch 9.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-11-07 05:59:34 Re: not null constraints, again
Previous Message David G. Johnston 2024-11-07 05:47:35 Re: Doc: typo in ddl.sgml