Adding NOTICE for differences between estimated and actual rows

From: Ilia Evdokimov <ilya(dot)evdokimov(at)tantorlabs(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Adding NOTICE for differences between estimated and actual rows
Date: 2024-09-05 16:05:06
Message-ID: aaf18433-82d3-417b-a1f5-12e62fedffb8@tantorlabs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

It's quite common that poor query performance can be attributed to
inaccurate row estimations by the planner. To make it easier to detect
these discrepancies, rather than scrutinizing the estimates manually, it
would be helpful to output a dedicated |NOTICE| message.

In the current patch, I've introduced a new GUC parameter called
'estimated_rows_scale_factor'. If the ratio of the estimated rows to the
actual rows is less than this factor, or if the estimated rows
significantly exceed the actual rows (when the ratio is greater than
this factor), a NOTICE message will be printed. The message reads:
"Estimated rows (%.0f) less(greater) than actual rows (%.0f)."

Here is an example:

CREATE TABLE t(a int, b int);
INSERT INTO t SELECT x/10, x FROM generate_series(1,10000000) g(x);
ANALYZE;

SET estimated_rows_scale_factor = 0.9;

EXPLAIN ANALYZE SELECT * FROM t WHERE a > 10 AND b <= 200;
NOTICE:  Estimated rows (1000) greater than actual rows (91).
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..107848.00 rows=1000 width=8) (actual
time=0.446..122.476 rows=91 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on t  (cost=0.00..106748.00 rows=417 width=8)
(actual time=77.657..118.000 rows=30 loops=3)
         Filter: ((a > 10) AND (b <= 200))
         Rows Removed by Filter: 3333303
 Planning Time: 0.097 ms
 Execution Time: 122.502 ms
(8 rows)

EXPLAIN ANALYZE SELECT * FROM t WHERE a = 10 AND b <= 200;
NOTICE:  Estimated rows (1) less than actual rows (10).
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..107748.10 rows=1 width=8) (actual
time=0.280..104.752 rows=10 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on t  (cost=0.00..106748.00 rows=1 width=8)
(actual time=66.493..101.102 rows=3 loops=3)
         Filter: ((b <= 200) AND (a = 10))
         Rows Removed by Filter: 3333330
 Planning Time: 0.129 ms
 Execution Time: 104.768 ms
(8 rows)

If you have any suggestions regarding the wording of the message, its
placement, or if you'd like to see a different criterion used, I would
greatly appreciate your feedback.

Looking forward to your thoughts and suggestions.

--
Regards,
Ilia Evdokimov,
Tantor Labs LCC.

Attachment Content-Type Size
v1-0001-Add-message-when-estimated-rows-differ-from-actual.patch text/x-patch 3.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Maxim Orlov 2024-09-05 16:08:58 Re: Test 041_checkpoint_at_promote.pl faild in installcheck due to missing injection_points
Previous Message Noah Misch 2024-09-05 15:54:35 Re: Use read streams in pg_visibility