AWS RDS PostgreSQL CPU Spiking to 100%

From: aditya desai <admad123(at)gmail(dot)com>
To: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: AWS RDS PostgreSQL CPU Spiking to 100%
Date: 2020-09-08 13:33:26
Message-ID: CAN0SRDHiuJC2ZgWr=aXrBBRAtome7sVua1=YkKZEzj_zy03t5Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi,
We have an application where one of the APIs calling queries(attached) is
spiking the CPU to 100% during load testing.
However, queries are making use of indexes(Bitmap Index and Bitmap Heap
scan though). When run separately on DB queries hardly take less than 200
ms. Is CPU spiking due to Bitmap Heap Scan?
These queries are being called thousands of times. Application team says
they have handled connection pooling from the Application side. So there is
no connection pooling here from DB side. Current db instance size is
"db.m4.4xlarge"
64 GB RAM 16 vCPU".
The Application dev team has primary keys and foreign keys on tables so
they are unable to partition the tables as well due to limitations of
postgres partitioning. Columns in WHERE clauses are not constant in all
queries to decide partition keys.

1. Does DB need more CPU considering this kind of load?
2. Can the query be tuned further? It is already using indexes(Bitmap
though).
3. Will connection pooling resolve the CPU Spike issues?

Also pasting Query and plans below.

----------------------exampleCount 1. Without
internalexamplecode-----------------------

lmp_examples=> explain analyze with exampleCount as ( select
examplestatuscode from example j where 1=1 and j.countrycode = 'AD' and
j.facilitycode in ('ABCD') and j.internalexamplecode in
('005','006','007','005') and ((j.examplestartdatetime between '2020-05-18
00:00:00' and '2020-08-19 00:00:00' ) or j.examplestartdatetime IS NULL )
group by j.examplestatuscode)
lmp_examples-> select js.examplestatuscode,COALESCE(count(*),0) stat_count
from exampleCount jc right outer join examplestatus js on
jc.examplestatuscode=js.examplestatuscode group by js.examplestatuscode ;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=79353.80..79353.89 rows=9 width=12) (actual
time=88.847..88.850 rows=9 loops=1)
Group Key: js.examplestatuscode
CTE examplecount
-> HashAggregate (cost=79352.42..79352.46 rows=4 width=4) (actual
time=88.803..88.805 rows=5 loops=1)
Group Key: j.examplestatuscode
-> Bitmap Heap Scan on example j (cost=1547.81..79251.08
rows=40538 width=4) (actual time=18.424..69.658 rows=62851 loops=1)
Recheck Cond: ((((countrycode)::text = 'AD'::text) AND
((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18
00:00:00'::timestamp without time zone) AND (examplestartdatetime <=
'2020-08-19 00:00:00'::timestamp without time zone)) OR
(examplestartdatetime IS NULL))
Filter: (((countrycode)::text = 'AD'::text) AND
((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
ANY ('{005,006,007,005}'::text[])))
Rows Removed by Filter: 3
Heap Blocks: exact=18307
-> BitmapOr (cost=1547.81..1547.81 rows=40538 width=0)
(actual time=15.707..15.707 rows=0 loops=1)
-> Bitmap Index Scan on example_list9_idx
(cost=0.00..1523.10 rows=40538 width=0) (actual time=15.702..15.702
rows=62851 loops=1)
Index Cond: (((countrycode)::text =
'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND
((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND
(examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time
zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without
time zone))
-> Bitmap Index Scan on example_list10_idx
(cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=3 loops=1)
Index Cond: (examplestartdatetime IS NULL)
-> Hash Left Join (cost=0.13..1.29 rows=9 width=4) (actual
time=88.831..88.840 rows=9 loops=1)
Hash Cond: ((js.examplestatuscode)::text =
(jc.examplestatuscode)::text)
-> Seq Scan on examplestatus js (cost=0.00..1.09 rows=9 width=4)
(actual time=0.004..0.007 rows=9 loops=1)
-> Hash (cost=0.08..0.08 rows=4 width=16) (actual
time=88.817..88.817 rows=5 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> CTE Scan on examplecount jc (cost=0.00..0.08 rows=4
width=16) (actual time=88.807..88.812 rows=5 loops=1)
Planning Time: 0.979 ms
Execution Time: 89.036 ms
(23 rows)

----------------exampleCount 2. With
internalexamplecode---------------------------------

lmp_examples=> explain analyze with exampleCount as ( select
examplestatuscode,count(1) stat_count from example j where 1=1 and
j.countrycode = 'AD' and j.facilitycode in ('ABCD') and
j.internalexamplecode in ('005','006','007','005') and
((j.examplestartdatetime between '2020-05-18 00:00:00' and '2020-08-19
00:00:00' ) or j.examplestartdatetime IS NULL ) group by
j.examplestatuscode)
lmp_examples-> select js.examplestatuscode,COALESCE(stat_count,0)
stat_count from exampleCount jc right outer join examplestatus js on
jc.examplestatuscode=js.examplestatuscode;

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=79453.94..79455.10 rows=9 width=12) (actual
time=89.660..89.669 rows=9 loops=1)
Hash Cond: ((js.examplestatuscode)::text = (jc.examplestatuscode)::text)
CTE examplecount
-> HashAggregate (cost=79453.77..79453.81 rows=4 width=12) (actual
time=89.638..89.640 rows=5 loops=1)
Group Key: j.examplestatuscode
-> Bitmap Heap Scan on example j (cost=1547.81..79251.08
rows=40538 width=4) (actual time=18.193..69.710 rows=62851 loops=1)
Recheck Cond: ((((countrycode)::text = 'AD'::text) AND
((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18
00:00:00'::timestamp without time zone) AND (examplestartdatetime <=
'2020-08-19 00:00:00'::timestamp without time zone)) OR
(examplestartdatetime IS NULL))
Filter: (((countrycode)::text = 'AD'::text) AND
((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
ANY ('{005,006,007,005}'::text[])))
Rows Removed by Filter: 3
Heap Blocks: exact=18307
-> BitmapOr (cost=1547.81..1547.81 rows=40538 width=0)
(actual time=15.483..15.483 rows=0 loops=1)
-> Bitmap Index Scan on example_list9_idx
(cost=0.00..1523.10 rows=40538 width=0) (actual time=15.477..15.478
rows=62851 loops=1)
Index Cond: (((countrycode)::text =
'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND
((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND
(examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time
zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without
time zone))
-> Bitmap Index Scan on example_list10_idx
(cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=3 loops=1)
Index Cond: (examplestartdatetime IS NULL)
-> Seq Scan on examplestatus js (cost=0.00..1.09 rows=9 width=4)
(actual time=0.003..0.005 rows=9 loops=1)
-> Hash (cost=0.08..0.08 rows=4 width=24) (actual time=89.650..89.651
rows=5 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> CTE Scan on examplecount jc (cost=0.00..0.08 rows=4 width=24)
(actual time=89.641..89.647 rows=5 loops=1)
Planning Time: 0.470 ms
Execution Time: 89.737 ms

------------------------exampleSelect-----------------------------------

lmp_examples=> explain analyze select j.id from example j where 1=1 and
j.countrycode = 'AD' and j.facilitycode in ('ABCD') and
j.examplestatuscode in ('101') and j.internalexamplecode in
('005','006','007','005') and ((j.examplestartdatetime between '2020-05-18
00:00:00' and '2020-08-19 00:00:00') or j.examplestartdatetime IS NULL)
ORDER BY createddate DESC limit 10;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=71286.65..71286.68 rows=10 width=12) (actual
time=47.351..47.359 rows=10 loops=1)
-> Sort (cost=71286.65..71335.31 rows=19462 width=12) (actual
time=47.349..47.352 rows=10 loops=1)
Sort Key: createddate DESC
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on example j (cost=1176.77..70866.09
rows=19462 width=12) (actual time=15.133..46.555 rows=2530 loops=1)
Recheck Cond: (((countrycode)::text = 'AD'::text) AND
((facilitycode)::text = 'ABCD'::text) AND ((examplestatuscode)::text =
'101'::text) AND ((internalexamplecode)::text = ANY
('{005,006,007,005}'::text[])))
Filter: (((examplestartdatetime >= '2020-05-18
00:00:00'::timestamp without time zone) AND (examplestartdatetime <=
'2020-08-19 00:00:00'::timestamp without time zone)) OR
(examplestartdatetime IS NULL))
Rows Removed by Filter: 38724
Heap Blocks: exact=20923
-> Bitmap Index Scan on example_list1_idx
(cost=0.00..1171.90 rows=33211 width=0) (actual time=9.938..9.939
rows=41254 loops=1)
Index Cond: (((countrycode)::text = 'AD'::text) AND
((facilitycode)::text = 'ABCD'::text) AND ((examplestatuscode)::text =
'101'::text) AND ((internalexamplecode)::text = ANY
('{005,006,007,005}'::text[])))
Planning Time: 0.398 ms
Execution Time: 47.416 ms

Regards,
Aditya.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pavel Stehule 2020-09-08 14:36:02 Re: AWS RDS PostgreSQL CPU Spiking to 100%
Previous Message Justin Pryzby 2020-09-08 10:49:22 Re: Query Performance in bundled requests