Re: BUG #17330: EXPLAIN hangs and very long query plans

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Strahinja Kustudić <strahinjak(at)nordeus(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #17330: EXPLAIN hangs and very long query plans
Date: 2021-12-10 16:33:55
Message-ID: CAFj8pRBhVvmPtyT5HLEF8RW=g+K1g2D-RMcPuHzqeHYbLK8Oqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

pá 10. 12. 2021 v 16:52 odesílatel Strahinja Kustudić <
strahinjak(at)nordeus(dot)com> napsal:

> On Fri, Dec 10, 2021 at 10:53 AM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
> wrote:
>
>> Hi
>>
>> pá 10. 12. 2021 v 10:25 odesílatel Strahinja Kustudić <
>> strahinjak(at)nordeus(dot)com> napsal:
>>
>>> On Fri, Dec 10, 2021 at 2:36 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>>>
>>>> Sounds like a problem with get_actual_variable_range(), which can scan
>>>> indexes at plan time to determine minimum or maximum values.
>>>>
>>>> This actually has been improved quite a bit since Postgres 10. So as
>>>> Jeff said, seems like you might benefit from upgrading to a newer
>>>> major version. v11 has improved things in this exact area.
>>>>
>>>
>>> On my Docker instance when I execute EXPLAIN it starts reading a lot of
>>> data. The indexes of the biggest table the query reads are 50GB, so my
>>> guess is that it reads those indexes.
>>>
>>> I allowed EXPLAIN in Docker to finish and it took almost 500 seconds and
>>> it was reading data all the time. After I reindexed the biggest table,
>>> EXPLAIN finished instantly. Can the index corruption cause this?
>>>
>>
>> The index can be bloated.
>>
>
> Is bloat just enough to make it slower, or the index needs to be corrupted?
>

corruption and bloatings are two very different things. Bloated index is
slower. Corrupted index returns bad results.

>
>>
>>> Note that this started happening in production after we deleted a few
>>> million rows from the biggest table.
>>>
>>
>> it is maybe same issue
>>
>>
>> https://www.postgresql.org/message-id/9ADAF560-F331-4AF8-A28A-F005070C3187%40yandex.ru
>>
>
> Looks similar because there were deletes after which query plans went to a
> few seconds.
>
> I just upgraded Postgres to 12 with pg_upgrade, did ANALYZE of the whole
> DB and the issue is similar, but EXPLAIN now finishes in 19 seconds instead
> of 500 seconds, which means the issue is still there. Here is the whole
> EXPLAIN ANALYZE (I added ANALYZE so that it displays planning duration):
>
> Sort (cost=1946.04..1946.16 rows=46 width=642) (actual
> time=10.929..10.932 rows=20 loops=1)
> Sort Key: m.delivery_start DESC
> Sort Method: quicksort Memory: 44kB
> -> Nested Loop Left Join (cost=10.19..1944.77 rows=46 width=642)
> (actual time=2.305..10.401 rows=20 loops=1)
> Join Filter: ((c.game_name = csu.game_name) AND (um.user_id =
> csu.user_id) AND (c.id = csu.campaign_id))
> Filter: ((c.id IS NULL) OR (c.active AND (now() >=
> c.campaign_start) AND ((now() <= c.campaign_end) OR (c.campaign_end IS
> NULL)) AND ((c.segment_ids <@
> '{5632,-2,2,5634,9731,5640,5641,9738,13,3599,5648,5655,13343,5
>
> 664,13344,13345,13346,13347,13348,13350,13351,13352,5673,13353,13354,1067,13355,13356,13357,13358,13359,5680,13360,6705,13361,13362,13363,13364,13365,13366,5687,13367,13368,13369,20538,13370,13371,13372,13373,6718,13374,190
>
> 07,19008,13378,5699,13379,13380,19013,3142,3143,19015,13385,5707,6731,20555,19019,3149,19021,3150,591,19023,592,19024,6737,19027,19028,5717,6743,17496,19033,13402,19035,6750,17502,19039,17504,19041,19043,19044,17508,6757,17
>
> 510,19047,104,19048,17512,17514,7788,6764,621,19053,623,19055,7792,625,19059,7796,1141,19061,19063,7800,19064,5241,19067,19068,3713,19073,1667,19075,1668,19079,19081,19083,19084,5262,19087,19088,8338,8339,7828,19093,20631,1
>
> 9095,20632,20633,12955,19099,19101,19103,19104,19107,19108,19113,19115,8876,19119,3760,19121,19123,19124,3785,4814,16605,4324,16622,16623,21253,21254,4359,21258,5901,5915,4900,22832,22835,22836,9013,3895,3902,1865,1866,1379
> ,1380,4978,7027,13685,14203,2428,2429,21887,21888,4485,22932,22933,21911,21417,432,434,20418,968,969,470,7642,991,992,10209,1512,1513,23024,9201,9207,23032,9208,9214}'::integer[])
> OR (csu.user_id = '-XXXXXXXXXXXXXXXXXX'::b
> igint))))
> Rows Removed by Filter: 11
> -> Nested Loop (cost=10.03..1910.89 rows=163 width=696) (actual
> time=2.029..10.092 rows=31 loops=1)
> -> Hash Left Join (cost=9.46..695.80 rows=61 width=684)
> (actual time=0.729..5.991 rows=55 loops=1)
> Hash Cond: ((m.game_name = c.game_name) AND
> (m.campaign_id = c.id))
> -> Index Scan using mail_pkey on mail m
> (cost=0.27..686.29 rows=61 width=552) (actual time=0.297..5.524 rows=55
> loops=1)
> Filter: (active AND (game_name =
> 'topeleven'::game_name) AND (now() >= delivery_start) AND ((now() <=
> delivery_end) OR (delivery_end IS NULL)))
> Rows Removed by Filter: 97
> -> Hash (cost=7.83..7.83 rows=91 width=136) (actual
> time=0.411..0.412 rows=91 loops=1)
> Buckets: 1024 Batches: 1 Memory Usage: 26kB
> -> Seq Scan on campaign c (cost=0.00..7.83
> rows=91 width=136) (actual time=0.265..0.371 rows=91 loops=1)
> Filter: (game_name =
> 'topeleven'::game_name)
> Rows Removed by Filter: 55
> -> Index Scan using user_mail_pkey on user_mail um
> (cost=0.57..19.88 rows=4 width=24) (actual time=0.074..0.074 rows=1
> loops=55)
> Index Cond: ((user_id =
> '-XXXXXXXXXXXXXXXXXX'::bigint) AND (game_name = 'topeleven'::game_name) AND
> (mail_id = m.id))
> Filter: ((m.mail_type <> 'question'::mail_type) OR
> (mail_status <> 'completed'::mail_status))
> -> Materialize (cost=0.15..28.18 rows=1 width=16) (actual
> time=0.009..0.009 rows=0 loops=31)
> -> Index Only Scan using campaign_static_users_pkey on
> campaign_static_users csu (cost=0.15..28.17 rows=1 width=16) (actual
> time=0.266..0.266 rows=0 loops=1)
> Index Cond: ((user_id =
> '-XXXXXXXXXXXXXXXXXX'::bigint) AND (game_name = 'topeleven'::game_name))
> Heap Fetches: 0
> Planning Time: 19408.862 ms
> Execution Time: 10.978 ms
> (27 rows)
>
> Just to mention again, reindexing the biggest table fixes the issue, but
> I'm more curious why did this happen and how can we prevent it from
> happening again? Or is it already fixed so that it cannot happen in the
> future?
>

Maybe VACUUM can solve this issue too.

Massive delete is a common issue - you can try to use partitioning and then
istead massive delete, use DROP partition.

Regards

Pavel

> Regards,
> Strahinja
>

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2021-12-10 16:39:22 Re: BUG #17330: EXPLAIN hangs and very long query plans
Previous Message Strahinja Kustudić 2021-12-10 15:52:45 Re: BUG #17330: EXPLAIN hangs and very long query plans