Re: Please help! Query jumps from 1s -> 4m

From: James Thompson <james(at)jthompson(dot)dev>
To: pgsql-performance(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, laurenz(dot)albe(at)cybertec(dot)at, dgrowleyml(at)gmail(dot)com
Subject: Re: Please help! Query jumps from 1s -> 4m
Date: 2020-05-13 14:17:03
Message-ID: CABoe=cTRMXFBTJ93nVk=JyPsFLTfxBZ+emGQU_wzPxiX40tMQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Just to follow up on this...
Tried increasing stats targets last week + re-analyzing but the query was
just as bad.
Ended up increasing the prepareThreshold to prevent server-side prepares
for now (and thus later generic statements). This 'fixed' the issue and had
no noticeable negative effect for our workloads.

I still don't understand why the plan being off makes the query so much
slower in this case (the plans I shared in the last email don't look too
different, I don't understand how the filter can add on 2mins of execution
time to an index-only scan). If anyone does have thoughts on what could be
happening I would be very interested to hear, but the main performance
problem is effectively solved.

Thanks all for the valuable help getting to the bottom of what was
happening.

On Tue, 5 May 2020 at 22:42, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> James Thompson <james(at)jthompson(dot)dev> writes:
> > The slowness occurs when the prepared statement changes to a generic
> plan.
>
> > Initial plan:
> > -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on
> table1
> > table1alias2 (cost=0.56..2549.70 rows=70 width=36) (actual
> > time=1.901..45.256 rows=65000 loops=1)
> > Output: table1alias2.uniqueid
> > Index Cond: ((table1alias2.col20 = '12345'::bigint) AND
> (table1alias2.
> > col8 = ANY ('{c5986b02-3a02-4639-8147-f286972413ba,...
> > 98ed24b1-76f5-4b0e-bb94-86cf13a4809c}'::text[])))
> > Heap Fetches: 10
> > Buffers: shared hit=5048
>
> > after 5 executions of the statement:
> > -> Index Only Scan using table1_typea_include_uniqueid_col16_idx on
> table1
> > table1alias2 (cost=0.56..17.23 rows=1 width=36) (actual
> > time=125.344..126877.822 rows=65000 loops=1)
> > Output: table1alias2.uniqueid
> > Index Cond: (table1alias2.col20 = $1001)
> > Filter: ((table1alias2.col8)::text = ANY ((ARRAY[$1, ...,
> > $1000])::text[]))
> > Rows Removed by Filter: 2670023
> > Heap Fetches: 428
> > Buffers: shared hit=45933 read=42060 dirtied=4
>
> Yeah, this is a dynamic we've seen before. The rowcount estimate, and
> hence the cost estimate, for the plan with explicit parameter values is
> way off; but the estimate for the generic plan is even more way off,
> causing the system to falsely decide that the latter is cheaper.
>
> I've speculated about refusing to believe generic cost estimates if they
> are
> more than epsilon less than the concrete cost estimate, but it's not quite
> clear how that should work or whether it'd have its own failure modes.
>
> The one thing that is totally clear is that these rowcount estimates are
> crappy. Can you improve them by increasing the stats target for that
> table? Maybe with less-garbage-y inputs, the system would make the right
> plan choice here.
>
> regards, tom lane
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Matthew Nelson 2020-05-14 05:44:56 Plan not skipping unnecessary inner join
Previous Message Steve Pritchard 2020-05-13 09:33:35 Re: Inaccurate Rows estimate for "Bitmap And" causes Planner to choose wrong join