Re: autoanalyze creates bad plan, manual analyze fixes it?

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Treat <rob(at)xzilla(dot)net>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: autoanalyze creates bad plan, manual analyze fixes it?
Date: 2020-09-15 04:53:54
Message-ID: CAFj8pRDmBFM=Km17xPuZAoeL53sFwB1wZUB2UzhbXhjX31VSbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

út 15. 9. 2020 v 1:11 odesílatel Robert Treat <rob(at)xzilla(dot)net> napsal:

> Howdy folks,
>
> Recently i've run into a problem where autoanalyze is causing a query
> plan to flip over to using an index which is about 10x slower, and the
> problem is fixed by running an alayze manually. some relevant info:
>
> UPDATE sleeping_intents SET
> raptor_after='2020-09-14T19:21:03.581106'::timestamp,
> status='requires_capture',
> updated_at='2020-09-14T16:21:03.581104+00:00'::timestamptz WHERE
> sleeping_intents.id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid AND
> sleeping_intents.status = 'finna' RETURNING *;
>
> The plan generated after autoanalyze is:
>
> Update on sleeping_intents (cost=0.70..7.11 rows=1 width=272) (actual
> time=57.945..57.945 rows=0 loops=1)
> Buffers: shared hit=43942
> -> Index Scan using
> sleeping_intents_status_created_at_raptor_after_idx on
> sleeping_intents (cost=0.70..7.11 rows=1 width=272) (actual
> time=57.943..57.943 rows=0 loops=1)
> Index Cond: (status = 'init'::text)
> Filter: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
> Rows Removed by Filter: 1262
> Buffers: shared hit=43942
> Planning time: 0.145 ms
> Execution time: 57.981 ms
>

This looks pretty strange - why for 1262 rows you need to read 43942 pages?

Can you reindex this index. Maybe it is bloated.

Regards

Pavel

>
> after i run analyze manually, the query plan is changed to this:
>
> Update on sleeping_intents (cost=0.57..8.59 rows=1 width=272) (actual
> time=0.023..0.023 rows=0 loops=1)
> Buffers: shared hit=7
> -> Index Scan using sleeping_intents_pkey on sleeping_intents
> (cost=0.57..8.59 rows=1 width=272) (actual time=0.022..0.022 rows=0
> loops=1)
> Index Cond: (id = 'r2d2dcc0-8a44-4d19-c3p0-28522233b836'::uuid)
> Filter: (status = 'init'::text)
> Rows Removed by Filter: 1
> Buffers: shared hit=7
> Planning time: 0.092 ms
> Execution time: 0.066 ms
>
> Note that in the second query, it switches back to using the primary
> key index, which does seem like the logically better choice, even
> though it shows a higher estimated cost than the "bad" case
> (understanding the cost must change somewhere in the process, but
> there no way to see it afaict).
>
> In trying to determine why it switches, I dug up some likely useful info:
> Index definitions:
> (20 GB) "sleeping_intents_pkey" PRIMARY KEY, btree (id)
> (37 GB) "sleeping_intents_status_created_at_raptor_after_idx" btree
> (status, created_at DESC, raptor_after DESC)
>
> Basic info on the table:
> > select relid::regclass,
> n_live_tup,n_mod_since_analyze,analyze_count,autoanalyze_count from
> pg_stat_user_tables where relname='sleeping_intents';
> relid | n_live_tup | n_mod_since_analyze | analyze_count |
> autoanalyze_count
>
> -----------------+------------+---------------------+---------------+-------------------
> sleeping_intents | 491171179 | 1939347 | 4 |
> 80
>
> (that num mods is in the last ~5 hours, the table is fairly active,
> although on a relatively small portion of the data)
>
> Statistics after manual analyze:
> tablename | attname | null_frac | avg_width |
> n_distinct | correlation | most_common_freqs
>
> -----------------+---------------+-----------+-----------+------------+-------------+--------------------------------------------------------
> sleeping_intents | id | 0 | 16 | -1
> | -0.00133045 | [null]
> sleeping_intents | status | 0 | 9 | 6
> | 0.848468 | {0.918343,0.0543667,0.0267567,0.000513333,1e-05,1e-05}
> sleeping_intents | created_at | 0 | 8 | -1
> | 0.993599 | [null]
> sleeping_intents | raptor_after | 0.0663433 | 8 | -0.933657
> | 0.99392 | [null]
>
> In a previous go around with this table, I also increased the
> statistics target for the id column to 1000, vs 100 which is the
> database default.
>
> Originally I was mostly interested in trying to understand why it
> would choose something other than the non-pk index, which sort of
> feels like a bug; what could be faster than seeking an individual
> entry in a pk index? There are cases where it might make sense, but
> this doesn't seem like one (even accounting for the infrequency of the
> status we are looking for, which is 1e-05, the disparity in index size
> should push it back to the pk imho, unless I am not thinking through
> correlation enough?).
>
> However, it also seems very odd that this problem occurs at all. In
> the last couple of times this has happened, the manual analyze has
> been run within ~30-45 minutes of the auto-analyze, and while the data
> is changing, it isn't changing that rapidly that this should make a
> significant difference, but I don't see any other reason that
> autoanalyze would produce a different result than manual analyze.
>
> All that said, any insight on the above two items would be great, but
> the most immediate concern would be around suggestions for preventing
> this from happening again?
>
> Thanks in advance,
>
> Robert Treat
> https://xzilla.net
>
>
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Treat 2020-09-15 05:21:08 Re: autoanalyze creates bad plan, manual analyze fixes it?
Previous Message Justin Pryzby 2020-09-14 23:41:32 Re: autoanalyze creates bad plan, manual analyze fixes it?