Re: BUG #17330: EXPLAIN hangs and very long query plans

From: Strahinja Kustudić <strahinjak(at)nordeus(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17330: EXPLAIN hangs and very long query plans
Date: 2021-12-10 15:52:45
Message-ID: CADKbJJVUfzb6BGUfH-zmvSJArfhPUFUGpM+ctHquQjF5WzeTTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Dec 10, 2021 at 10:53 AM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
wrote:

> Hi
>
> pá 10. 12. 2021 v 10:25 odesílatel Strahinja Kustudić <
> strahinjak(at)nordeus(dot)com> napsal:
>
>> On Fri, Dec 10, 2021 at 2:36 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>>
>>> Sounds like a problem with get_actual_variable_range(), which can scan
>>> indexes at plan time to determine minimum or maximum values.
>>>
>>> This actually has been improved quite a bit since Postgres 10. So as
>>> Jeff said, seems like you might benefit from upgrading to a newer
>>> major version. v11 has improved things in this exact area.
>>>
>>
>> On my Docker instance when I execute EXPLAIN it starts reading a lot of
>> data. The indexes of the biggest table the query reads are 50GB, so my
>> guess is that it reads those indexes.
>>
>> I allowed EXPLAIN in Docker to finish and it took almost 500 seconds and
>> it was reading data all the time. After I reindexed the biggest table,
>> EXPLAIN finished instantly. Can the index corruption cause this?
>>
>
> The index can be bloated.
>

Is bloat just enough to make it slower, or the index needs to be corrupted?

>
>> Note that this started happening in production after we deleted a few
>> million rows from the biggest table.
>>
>
> it is maybe same issue
>
>
> https://www.postgresql.org/message-id/9ADAF560-F331-4AF8-A28A-F005070C3187%40yandex.ru
>

Looks similar because there were deletes after which query plans went to a
few seconds.

I just upgraded Postgres to 12 with pg_upgrade, did ANALYZE of the whole DB
and the issue is similar, but EXPLAIN now finishes in 19 seconds instead of
500 seconds, which means the issue is still there. Here is the whole
EXPLAIN ANALYZE (I added ANALYZE so that it displays planning duration):

Sort (cost=1946.04..1946.16 rows=46 width=642) (actual time=10.929..10.932
rows=20 loops=1)
Sort Key: m.delivery_start DESC
Sort Method: quicksort Memory: 44kB
-> Nested Loop Left Join (cost=10.19..1944.77 rows=46 width=642)
(actual time=2.305..10.401 rows=20 loops=1)
Join Filter: ((c.game_name = csu.game_name) AND (um.user_id =
csu.user_id) AND (c.id = csu.campaign_id))
Filter: ((c.id IS NULL) OR (c.active AND (now() >=
c.campaign_start) AND ((now() <= c.campaign_end) OR (c.campaign_end IS
NULL)) AND ((c.segment_ids <@
'{5632,-2,2,5634,9731,5640,5641,9738,13,3599,5648,5655,13343,5
664,13344,13345,13346,13347,13348,13350,13351,13352,5673,13353,13354,1067,13355,13356,13357,13358,13359,5680,13360,6705,13361,13362,13363,13364,13365,13366,5687,13367,13368,13369,20538,13370,13371,13372,13373,6718,13374,190
07,19008,13378,5699,13379,13380,19013,3142,3143,19015,13385,5707,6731,20555,19019,3149,19021,3150,591,19023,592,19024,6737,19027,19028,5717,6743,17496,19033,13402,19035,6750,17502,19039,17504,19041,19043,19044,17508,6757,17
510,19047,104,19048,17512,17514,7788,6764,621,19053,623,19055,7792,625,19059,7796,1141,19061,19063,7800,19064,5241,19067,19068,3713,19073,1667,19075,1668,19079,19081,19083,19084,5262,19087,19088,8338,8339,7828,19093,20631,1
9095,20632,20633,12955,19099,19101,19103,19104,19107,19108,19113,19115,8876,19119,3760,19121,19123,19124,3785,4814,16605,4324,16622,16623,21253,21254,4359,21258,5901,5915,4900,22832,22835,22836,9013,3895,3902,1865,1866,1379
,1380,4978,7027,13685,14203,2428,2429,21887,21888,4485,22932,22933,21911,21417,432,434,20418,968,969,470,7642,991,992,10209,1512,1513,23024,9201,9207,23032,9208,9214}'::integer[])
OR (csu.user_id = '-XXXXXXXXXXXXXXXXXX'::b
igint))))
Rows Removed by Filter: 11
-> Nested Loop (cost=10.03..1910.89 rows=163 width=696) (actual
time=2.029..10.092 rows=31 loops=1)
-> Hash Left Join (cost=9.46..695.80 rows=61 width=684)
(actual time=0.729..5.991 rows=55 loops=1)
Hash Cond: ((m.game_name = c.game_name) AND
(m.campaign_id = c.id))
-> Index Scan using mail_pkey on mail m
(cost=0.27..686.29 rows=61 width=552) (actual time=0.297..5.524 rows=55
loops=1)
Filter: (active AND (game_name =
'topeleven'::game_name) AND (now() >= delivery_start) AND ((now() <=
delivery_end) OR (delivery_end IS NULL)))
Rows Removed by Filter: 97
-> Hash (cost=7.83..7.83 rows=91 width=136) (actual
time=0.411..0.412 rows=91 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 26kB
-> Seq Scan on campaign c (cost=0.00..7.83
rows=91 width=136) (actual time=0.265..0.371 rows=91 loops=1)
Filter: (game_name =
'topeleven'::game_name)
Rows Removed by Filter: 55
-> Index Scan using user_mail_pkey on user_mail um
(cost=0.57..19.88 rows=4 width=24) (actual time=0.074..0.074 rows=1
loops=55)
Index Cond: ((user_id = '-XXXXXXXXXXXXXXXXXX'::bigint)
AND (game_name = 'topeleven'::game_name) AND (mail_id = m.id))
Filter: ((m.mail_type <> 'question'::mail_type) OR
(mail_status <> 'completed'::mail_status))
-> Materialize (cost=0.15..28.18 rows=1 width=16) (actual
time=0.009..0.009 rows=0 loops=31)
-> Index Only Scan using campaign_static_users_pkey on
campaign_static_users csu (cost=0.15..28.17 rows=1 width=16) (actual
time=0.266..0.266 rows=0 loops=1)
Index Cond: ((user_id = '-XXXXXXXXXXXXXXXXXX'::bigint)
AND (game_name = 'topeleven'::game_name))
Heap Fetches: 0
Planning Time: 19408.862 ms
Execution Time: 10.978 ms
(27 rows)

Just to mention again, reindexing the biggest table fixes the issue, but
I'm more curious why did this happen and how can we prevent it from
happening again? Or is it already fixed so that it cannot happen in the
future?

Regards,
Strahinja

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Pavel Stehule 2021-12-10 16:33:55 Re: BUG #17330: EXPLAIN hangs and very long query plans
Previous Message Alex Enachioaie 2021-12-10 15:46:11 Re: BUG #17327: Postgres server does not correctly emit error for max_slot_wal_keep_size being breached