wrong rows and cost estimation when generic plan

From: James Pang <jamespang886(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: wrong rows and cost estimation when generic plan
Date: 2022-12-06 02:28:11
Message-ID: CAHgTRfdzZWubmcyXG+5aaVg3QZSBN+gmNY74ZMduz1-wDkLngw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,
It's a prepared sql statement on a non-partitioned table , 16millions
tuples and multiple indexes on this table. pk_xxxxx primary
key (aid,bid,btype) all 3 cols are bigint datatype, there is another index
idx_xxxxx(starttime,endtime) , both cols are "timestamp(0) without time
zone".
the data distribution is skewed, not even. with first 5 times execution
custom_plan, optimizer choose primary key, but when it start building
generic plan and choose another index idx_xxxx, obviously generic plan make
significant different rows and cost estimation.
below is the sql , sensitive info got masked here (tablename,
columnname) .

--with custom_plan
Update on xxxxx (cost=0.56..8.60 rows=1 width=2923) (actual
time=0.030..0.031 rows=0 loops=1)
Buffers: shared hit=4
-> Index Scan using pk_xxxxx on xxxxxxx (cost=0.56..8.60 rows=1
width=2923) (actual time=0.028..0.028 rows=0 loops=1)
Index Cond: ((aid = '14654072'::bigint) AND (bid =
'243379969878556159'::bigint) AND (btype = '0'::bigint))
Filter: ((password IS NULL) AND ...) AND (starttime = '2071-12-31
00:00:00'::timestamp without time zone) AND (endtime = '2072-01-01
00:00:00'::timestamp without time zone) AND (opentime = '2022-11-07 09:
40:26'::timestamp without time zone)
Buffers: shared hit=4
Planning Time: 1.575 ms
Execution Time: 0.123 ms

--after 5 times execution, it start to build generic plan and thought
generic plan cost=0.44..8.48 that less than the customer plan ,so it choose
generic plan for following sql executions,
Update on xxxxx (cost=0.44..8.48 rows=1 width=2923) (actual
time=8136.243..8136.245 rows=0 loops=1)
Buffers: shared hit=1284549
-> Index Scan using idx_xxxxx_time on xxxxx (cost=0.44..8.48 rows=1
width=2923) (actual time=8136.242..8136.242 rows=0 loops=1)
Index Cond: ((starttime = $7) AND (endtime = $8))
Filter: ((password IS NULL) AND ...(aid = $4) AND (bid = $5) AND
(btype = $6) AND...
Rows Removed by Filter: 5534630
Buffers: shared hit=1284549
Planning Time: 0.754 ms
Execution Time: 8136.302 ms

as a workaround, I remove "starttime" and "endtime" stats tuple from
pg_statistic, and optimizer use a DEFAULT value with NULL stats tuple so
that index_path cost > the primary key index_path cost, following eqsel
function logic, postgres/selfuncs.c at REL_13_STABLE · postgres/postgres ·
GitHub
<https://github.com/postgres/postgres/blob/REL_13_STABLE/src/backend/utils/adt/selfuncs.c>
optimzer is very complicated, could you direct me how optimizer to do
selectivity estimation when building generic plan, for this case? for
custom_plan, optimizer knows boundparams values, but when generic_plan,
planner() use boundparams=NULL, it try to calculate average value based on
mcv list of the index attributes (starttime,endtime) ?
please check attached about sql details and pg_stats tuple for the
index attributes.

Thanks,

James

Attachment Content-Type Size
wrong_selectivity_estimation_generic_plan.txt text/plain 14.6 KB

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message James Pang (chaolpan) 2022-12-06 05:28:33 RE: wrong rows and cost estimation when generic plan
Previous Message Jeff Janes 2022-12-04 03:50:57 Re: Catching up with performance & PostgreSQL 15