From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | James Thompson <james(at)jthompson(dot)dev> |
Cc: | pgsql-performance(at)postgresql(dot)org, Justin Pryzby <pryzby(at)telsasoft(dot)com>, laurenz(dot)albe(at)cybertec(dot)at, dgrowleyml(at)gmail(dot)com |
Subject: | Re: Please help! Query jumps from 1s -> 4m |
Date: | 2020-05-05 21:42:48 |
Message-ID: | 26352.1588714968@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
James Thompson <james(at)jthompson(dot)dev> writes:
> The slowness occurs when the prepared statement changes to a generic plan.
> Initial plan:
> -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> table1alias2 (cost=0.56..2549.70 rows=70 width=36) (actual
> time=1.901..45.256 rows=65000 loops=1)
> Output: table1alias2.uniqueid
> Index Cond: ((table1alias2.col20 = '12345'::bigint) AND (table1alias2.
> col8 = ANY ('{c5986b02-3a02-4639-8147-f286972413ba,...
> 98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))
> Heap Fetches: 10
> Buffers: shared hit=5048
> after 5 executions of the statement:
> -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on table1
> table1alias2 (cost=0.56..17.23 rows=1 width=36) (actual
> time=125.344..126877.822 rows=65000 loops=1)
> Output: table1alias2.uniqueid
> Index Cond: (table1alias2.col20 = $1001)
> Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ...,
> $1000])::text[]))
> Rows Removed by Filter: 2670023
> Heap Fetches: 428
> Buffers: shared hit=45933 read=42060 dirtied=4
Yeah, this is a dynamic we've seen before. The rowcount estimate, and
hence the cost estimate, for the plan with explicit parameter values is
way off; but the estimate for the generic plan is even more way off,
causing the system to falsely decide that the latter is cheaper.
I've speculated about refusing to believe generic cost estimates if they are
more than epsilon less than the concrete cost estimate, but it's not quite
clear how that should work or whether it'd have its own failure modes.
The one thing that is totally clear is that these rowcount estimates are
crappy. Can you improve them by increasing the stats target for that
table? Maybe with less-garbage-y inputs, the system would make the right
plan choice here.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Arya F | 2020-05-06 00:15:14 | Re: 600 million rows of data. Bad hardware or need partitioning? |
Previous Message | Justin Pryzby | 2020-05-05 21:35:54 | Re: Please help! Query jumps from 1s -> 4m |