Incremental Sort Cost Estimation Instability

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Subject: Incremental Sort Cost Estimation Instability
Date: 2024-06-26 15:00:27
Message-ID: ba0edc53-4b1f-4c67-92d1-29aeddb36a18@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While designing an improvement for the cost sort model, I discovered
that the query plan can vary if we slightly change the query text
without pushing semantic differences. See the example below:

CREATE TABLE test(x integer, y integer,z text);
INSERT INTO test (x,y) SELECT x, 1 FROM generate_series(1,1000000) AS x;
CREATE INDEX ON test(x);
CREATE INDEX ON test(y);
VACUUM ANALYZE;
SET max_parallel_workers_per_gather = 0;

First query:

EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM test t1, test t2
WHERE t1.x=t2.y AND t1.y=t2.x GROUP BY t1.x,t1.y;

And the second one - just reverse the left and right sides of
expressions in the WHERE condition:

EXPLAIN (ANALYZE, TIMING OFF) SELECT count(*) FROM test t1, test t2
WHERE t2.y=t1.x AND t2.x=t1.y GROUP BY t1.x,t1.y;

You can see two different plans here:

GroupAggregate (cost=37824.89..37824.96 rows=1 width=16)
Group Key: t1.y, t1.x
-> Incremental Sort (cost=37824.89..37824.94 rows=2 width=8)
Sort Key: t1.y, t1.x
Presorted Key: t1.y
-> Merge Join (cost=0.85..37824.88 rows=1 width=8)
Merge Cond: (t1.y = t2.x)
Join Filter: (t2.y = t1.x)
-> Index Scan using test_y_idx on test t1
-> Index Scan using test_x_idx on test t2

GroupAggregate (cost=37824.89..37824.92 rows=1 width=16)
Group Key: t1.x, t1.y
-> Sort (cost=37824.89..37824.90 rows=1 width=8)
Sort Key: t1.x, t1.y
Sort Method: quicksort Memory: 25kB
-> Merge Join (cost=0.85..37824.88 rows=1 width=8)
Merge Cond: (t1.y = t2.x)
Join Filter: (t2.y = t1.x)
-> Index Scan using test_y_idx on test t1
-> Index Scan using test_x_idx on test t2

Don't mind for now that the best plan is to do IncrementalSort with
presorted key t1.x. Just pay attention to the fact that the plan has
altered without any valuable reason.
The cost_incremental_sort() routine causes such behaviour: it chooses
the expression to estimate the number of groups from the first
EquivalenceClass member that relies on the syntax.
I tried to invent a simple solution to fight this minor case. But the
most clear and straightforward way here is to save a reference to the
expression that triggered the PathKey creation inside the PathKey itself.
See the sketch of the patch in the attachment.
I'm not sure this instability is worth fixing this way, but the
dependence of the optimisation outcome on the query text looks buggy.

--
regards, Andrei Lepikhov

Attachment Content-Type Size
0001-Add-a-reference-to-the-expression-which-has-triggere.patch text/plain 9.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-06-26 15:24:38 Re: pgindent exit status if a file encounters an error
Previous Message David G. Johnston 2024-06-26 14:58:55 Re: [PATCH] Add ACL (Access Control List) acronym