Consider the number of columns in the sort cost model

From: Andrei Lepikhov <lepihov(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Consider the number of columns in the sort cost model
Date: 2024-08-22 18:46:01
Message-ID: 8742aaa8-9519-4a1f-91bd-364aec65f5cf@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I would like to propose a slight elaboration of the sort cost model.
In practice, we frequently see the choice of suboptimal sortings, which
slump performance by 10-50%.

The key reason here is the optimiser's blindness to the fact that
sorting calls a comparison operator for each pair of attributes in the
tuple. Just for example:

SET work_mem = '128MB';
CREATE TABLE test (
x1 int,x2 int,x3 int,x4 int,x5 int,x6 int,x7 int,x8 int,payload text);
INSERT INTO test (x1,x2,x3,x4,x5,x6,x7,x8,payload) (
SELECT 1,2,3,4,5,6,7,(1E5-x), 'abc'||x
FROM generate_series(1,1E5) AS x);
VACUUM ANALYZE test;

Let's execute the following three queries:

EXPLAIN (ANALYZE)
SELECT * FROM test WHERE x1<9E4 ORDER BY x8;
EXPLAIN (ANALYZE)
SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x8;
EXPLAIN (ANALYZE)
SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x4,x5,x6,x7,x8;

My laptop executes these three queries in 100ms, 300ms, and 560ms,
respectively. At the same time, the costs of these query plans are
similar. This corner case shows that sorting matters and in the corner
case, dependence on the number of comparisons looks close to linear.

The patch in the attachment is a simplistic attempt to differentiate
these sortings by the number of columns. It is not ideal because
comparisons depend on the duplicates in each column, but it may be the
subject of further work.

Even such a trivial model allows us to resolve the case like the below:
CREATE INDEX ON test (x1,x2,x3,x8);
EXPLAIN (ANALYZE) SELECT * FROM test WHERE x1<9E4 ORDER BY x1,x2,x3,x8;

The current master will choose SeqScan for four columns as well as for a
single column. With the model, the optimiser will understand that
sorting four columns costs more than sorting a single column and switch
to IndexScan.

--
regards, Andrei Lepikhov

Attachment Content-Type Size
0001-Introduce-the-number-of-columns-in-the-cost-sort-mod.patch text/plain 19.7 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2024-08-22 18:56:55 Re: Partial aggregates pushdown
Previous Message Alvaro Herrera 2024-08-22 18:41:50 Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails