Re: BUG #17710: Slow queries (100% CPU) after auto-vacuum

From: Jean-Francois Levesque <jf(dot)levesque(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #17710: Slow queries (100% CPU) after auto-vacuum
Date: 2022-12-12 14:37:10
Message-ID: CADYo02giUPgqhMjv4QhVDfKV7RJi-QNfRiLQHrt7chX92ixQdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Tom,

Thanks for the wiki page. Here are the EXPLAIN ANALYZE results on the query.

The EXPLAINs have been executed inside the transaction, at the end of the
test, just before the rollback. Initially I ran them outside the
transaction but they are pretty clean all the time.

FYI, I also have this issue with v12, v13 and v14. I was unable to test
with v15 for the moment.

Thanks,

Jean-Francois

------- v10 -------

v1/shared_models/tests/test_session.py Update on shared_models_session
(cost=46.88..175.60 rows=16 width=530) (actual time=0.003..0.004 rows=0
loops=1)
-> Nested Loop (cost=46.88..175.60 rows=16 width=530) (actual
time=0.003..0.003 rows=0 loops=1)
-> HashAggregate (cost=46.60..46.76 rows=16 width=32) (actual
time=0.003..0.003 rows=0 loops=1)
Group Key: "ANY_subquery".id
-> Subquery Scan on "ANY_subquery" (cost=4.79..46.56
rows=16 width=32) (actual time=0.002..0.002 rows=0 loops=1)
-> LockRows (cost=4.79..46.40 rows=16 width=22)
(actual time=0.002..0.002 rows=0 loops=1)
-> Nested Loop (cost=4.79..46.24 rows=16
width=22) (actual time=0.002..0.002 rows=0 loops=1)
-> Nested Loop (cost=4.51..27.86 rows=16
width=16) (actual time=0.002..0.002 rows=0 loops=1)
-> Index Scan using
shared_models_monitoring_owner_id_aedf0b23 on shared_models_monitoring u0
(cost=0.14..8.16 rows=1 width=10) (actual time=0.002..0.002 rows=0 loops=1)
Index Cond: (owner_id = 28)
-> Bitmap Heap Scan on
shared_models_sessionmonitoring v1 (cost=4.37..19.59 rows=11 width=14)
(never executed)
Recheck Cond: (monitoring_id =
u0.id)
-> Bitmap Index Scan on
shared_models_sessionmonitoring_monitoring_id_e9953e3e (cost=0.00..4.36
rows=11 width=0) (never executed)
Index Cond:
(monitoring_id = u0.id)
-> Index Scan using
shared_models_session_pkey on shared_models_session v0 (cost=0.28..1.15
rows=1 width=10) (never executed)
Index Cond: (id = v1.session_id)
-> Index Scan using shared_models_session_pkey on
shared_models_session (cost=0.28..8.05 rows=1 width=453) (never executed)
Index Cond: (id = "ANY_subquery".id)
Planning time: 0.116 ms
Execution time: 0.032 ms

------- v11 - normal speed -------

v1/shared_models/tests/test_session.py Update on shared_models_session
(cost=45.54..170.25 rows=16 width=530) (actual time=0.006..0.007 rows=0
loops=1)
-> Nested Loop (cost=45.54..170.25 rows=16 width=530) (actual
time=0.006..0.006 rows=0 loops=1)
-> HashAggregate (cost=45.25..45.41 rows=16 width=32) (actual
time=0.005..0.006 rows=0 loops=1)
Group Key: "ANY_subquery".id
-> Subquery Scan on "ANY_subquery" (cost=4.79..45.21
rows=16 width=32) (actual time=0.004..0.005 rows=0 loops=1)
-> LockRows (cost=4.79..45.05 rows=16 width=22)
(actual time=0.004..0.005 rows=0 loops=1)
-> Nested Loop (cost=4.79..44.89 rows=16
width=22) (actual time=0.004..0.004 rows=0 loops=1)
-> Nested Loop (cost=4.51..27.86 rows=16
width=16) (actual time=0.004..0.004 rows=0 loops=1)
-> Index Scan using
shared_models_monitoring_owner_id_aedf0b23 on shared_models_monitoring u0
(cost=0.14..8.16 rows=1 width=10) (actual time=0.004..0.004 rows=0 loops=1)
Index Cond: (owner_id = 2)
-> Bitmap Heap Scan on
shared_models_sessionmonitoring v1 (cost=4.37..19.59 rows=11 width=14)
(never executed)
Recheck Cond: (monitoring_id =
u0.id)
-> Bitmap Index Scan on
shared_models_sessionmonitoring_monitoring_id_e9953e3e (cost=0.00..4.36
rows=11 width=0) (never executed)
Index Cond:
(monitoring_id = u0.id)
-> Index Scan using
shared_models_session_pkey on shared_models_session v0 (cost=0.28..1.06
rows=1 width=10) (never executed)
Index Cond: (id = v1.session_id)
-> Index Scan using shared_models_session_pkey on
shared_models_session (cost=0.28..7.80 rows=1 width=453) (never executed)
Index Cond: (id = "ANY_subquery".id)
Planning Time: 0.162 ms
Execution Time: 0.061 ms

------- v11 - slow query -------

v1/shared_models/tests/test_session.py Update on shared_models_session
(cost=0.90..33.02 rows=1 width=530) (actual time=86918.598..86918.600
rows=0 loops=1)
-> Nested Loop Semi Join (cost=0.90..33.02 rows=1 width=530) (actual
time=86918.597..86918.598 rows=0 loops=1)
Join Filter: (shared_models_session.id = "ANY_subquery".id)
-> Index Scan using shared_models_session_pkey on
shared_models_session (cost=0.25..8.26 rows=1 width=453) (actual
time=0.005..1.762 rows=488 loops=1)
-> Subquery Scan on "ANY_subquery" (cost=0.65..24.74 rows=1
width=32) (actual time=178.107..178.107 rows=0 loops=488)
-> LockRows (cost=0.65..24.73 rows=1 width=22) (actual
time=178.107..178.107 rows=0 loops=488)
-> Nested Loop (cost=0.65..24.72 rows=1 width=22)
(actual time=178.106..178.106 rows=0 loops=488)
Join Filter: (v1.monitoring_id = u0.id)
-> Nested Loop (cost=0.50..16.54 rows=1
width=20) (actual time=0.537..177.918 rows=488 loops=488)
Join Filter: (v0.id = v1.session_id)
Rows Removed by Join Filter: 118828
-> Index Scan using
shared_models_sessionmonitoring_monitoring_id_e9953e3e on
shared_models_sessionmonitoring v1 (cost=0.25..8.26 rows=1 width=14)
(actual time=0.004..0.116 rows=488 loops=488)
-> Index Scan using
shared_models_session_pkey on shared_models_session v0 (cost=0.25..8.26
rows=1 width=10) (actual time=0.002..0.354 rows=244 loops=238144)
-> Index Scan using
shared_models_monitoring_owner_id_aedf0b23 on shared_models_monitoring u0
(cost=0.14..8.16 rows=1 width=10) (actual time=0.000..0.000 rows=0
loops=238144)
Index Cond: (owner_id = 6)
Planning Time: 0.111 ms
Execution Time: 86918.632 ms

On Fri, Dec 9, 2022 at 9:13 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> > [ query gets slower after auto-vacuum ]
>
> This sounds less like an auto-vacuum problem than an auto-analyze
> problem. That is, probably a background update of table statistics
> is happening and that is triggering the planner to pick a different
> plan. In an ideal world, more up-to-date stats would always result
> in a better plan, but reality is not always so kind.
>
> There is not enough info here to diagnose the problem though.
> At minimum we need to see EXPLAIN ANALYZE output for the problem
> query in both the fast and slow states. We have a wiki page about
> reporting slow-query problems [1] which asks for quite a bit of
> info, but for starters maybe the EXPLAIN ANALYZE results would
> be enough.
>
> > I am having an issue that started with 11.0 and is still present with
> 11.18.
> > The issue is not present with 10.23.
>
> If you could get EXPLAIN ANALYZE results from v10 as well as v11,
> that would be useful too.
>
> regards, tom lane
>
> [1] https://wiki.postgresql.org/wiki/Slow_Query_Questions
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2022-12-12 15:42:46 BUG #17715: dumps file raise an error on restore if a materialized view use a function calling another function
Previous Message PG Bug reporting form 2022-12-12 10:34:47 BUG #17714: Refresh materialized view does not update all date