Re: AWS RDS PostgreSQL CPU Spiking to 100%

From: aditya desai <admad123(at)gmail(dot)com>
To: Prince Pathria <prince(dot)pathria(at)goevive(dot)com>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: AWS RDS PostgreSQL CPU Spiking to 100%
Date: 2020-09-30 07:13:50
Message-ID: CAN0SRDHavnJPgY3TAxPhL38aC4NYhdBu0_1MC6WVFW-K8Rcpug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks, I'll check it out.

On Mon, Sep 28, 2020 at 9:40 PM Prince Pathria <prince(dot)pathria(at)goevive(dot)com>
wrote:

> We faced a similar issue, adding RDS proxy in front of RDS Postgres can
> help.
> In our situation, there were a lot of connects/disconnects from Lambda
> functions although concurrency of Lambda was 100 only.
> And adding connection pooler(RDS proxy) helped us to reduce the CPU load
> from 100% to 30%
>
> Happy to help :)
> Prince Pathria Systems Engineer | Certified Kubernetes Administrator |
> AWS Certified Solutions Architect Evive +91 9478670472 goevive.com
>
>
> On Mon, Sep 28, 2020 at 9:21 PM aditya desai <admad123(at)gmail(dot)com> wrote:
>
>>
>>> Hi,
>>> We have an application where one of the APIs calling queries(attached)
>>> is spiking the CPU to 100% during load testing.
>>> However, queries are making use of indexes(Bitmap Index and Bitmap Heap
>>> scan though). When run separately on DB queries hardly take less than 200
>>> ms. Is CPU spiking due to Bitmap Heap Scan?
>>> These queries are being called thousands of times. Application team says
>>> they have handled connection pooling from the Application side. So there is
>>> no connection pooling here from DB side. Current db instance size is "db.m4.4xlarge"
>>> 64 GB RAM 16 vCPU".
>>> The Application dev team has primary keys and foreign keys on tables so
>>> they are unable to partition the tables as well due to limitations of
>>> postgres partitioning. Columns in WHERE clauses are not constant in all
>>> queries to decide partition keys.
>>>
>>> 1. Does DB need more CPU considering this kind of load?
>>> 2. Can the query be tuned further? It is already using indexes(Bitmap
>>> though).
>>> 3. Will connection pooling resolve the CPU Spike issues?
>>>
>>> Also pasting Query and plans below.
>>>
>>> ----------------------exampleCount 1. Without
>>> internalexamplecode-----------------------
>>>
>>> lmp_examples=> explain analyze with exampleCount as ( select
>>> examplestatuscode from example j where 1=1 and j.countrycode = 'AD' and
>>> j.facilitycode in ('ABCD') and j.internalexamplecode in
>>> ('005','006','007','005') and ((j.examplestartdatetime between '2020-05-18
>>> 00:00:00' and '2020-08-19 00:00:00' ) or j.examplestartdatetime IS NULL )
>>> group by j.examplestatuscode)
>>> lmp_examples-> select js.examplestatuscode,COALESCE(count(*),0)
>>> stat_count from exampleCount jc right outer join examplestatus js on
>>> jc.examplestatuscode=js.examplestatuscode group by js.examplestatuscode ;
>>>
>>>
>>> QUERY PLAN
>>>
>>>
>>>
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> HashAggregate (cost=79353.80..79353.89 rows=9 width=12) (actual
>>> time=88.847..88.850 rows=9 loops=1)
>>> Group Key: js.examplestatuscode
>>> CTE examplecount
>>> -> HashAggregate (cost=79352.42..79352.46 rows=4 width=4) (actual
>>> time=88.803..88.805 rows=5 loops=1)
>>> Group Key: j.examplestatuscode
>>> -> Bitmap Heap Scan on example j (cost=1547.81..79251.08
>>> rows=40538 width=4) (actual time=18.424..69.658 rows=62851 loops=1)
>>> Recheck Cond: ((((countrycode)::text = 'AD'::text) AND
>>> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
>>> ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18
>>> 00:00:00'::timestamp without time zone) AND (examplestartdatetime <=
>>> '2020-08-19 00:00:00'::timestamp without time zone)) OR
>>> (examplestartdatetime IS NULL))
>>> Filter: (((countrycode)::text = 'AD'::text) AND
>>> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
>>> ANY ('{005,006,007,005}'::text[])))
>>> Rows Removed by Filter: 3
>>> Heap Blocks: exact=18307
>>> -> BitmapOr (cost=1547.81..1547.81 rows=40538
>>> width=0) (actual time=15.707..15.707 rows=0 loops=1)
>>> -> Bitmap Index Scan on example_list9_idx
>>> (cost=0.00..1523.10 rows=40538 width=0) (actual time=15.702..15.702
>>> rows=62851 loops=1)
>>> Index Cond: (((countrycode)::text =
>>> 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND
>>> ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND
>>> (examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time
>>> zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without
>>> time zone))
>>> -> Bitmap Index Scan on example_list10_idx
>>> (cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=3 loops=1)
>>> Index Cond: (examplestartdatetime IS NULL)
>>> -> Hash Left Join (cost=0.13..1.29 rows=9 width=4) (actual
>>> time=88.831..88.840 rows=9 loops=1)
>>> Hash Cond: ((js.examplestatuscode)::text =
>>> (jc.examplestatuscode)::text)
>>> -> Seq Scan on examplestatus js (cost=0.00..1.09 rows=9
>>> width=4) (actual time=0.004..0.007 rows=9 loops=1)
>>> -> Hash (cost=0.08..0.08 rows=4 width=16) (actual
>>> time=88.817..88.817 rows=5 loops=1)
>>> Buckets: 1024 Batches: 1 Memory Usage: 9kB
>>> -> CTE Scan on examplecount jc (cost=0.00..0.08 rows=4
>>> width=16) (actual time=88.807..88.812 rows=5 loops=1)
>>> Planning Time: 0.979 ms
>>> Execution Time: 89.036 ms
>>> (23 rows)
>>>
>>>
>>> ----------------exampleCount 2. With
>>> internalexamplecode---------------------------------
>>>
>>>
>>> lmp_examples=> explain analyze with exampleCount as ( select
>>> examplestatuscode,count(1) stat_count from example j where 1=1 and
>>> j.countrycode = 'AD' and j.facilitycode in ('ABCD') and
>>> j.internalexamplecode in ('005','006','007','005') and
>>> ((j.examplestartdatetime between '2020-05-18 00:00:00' and '2020-08-19
>>> 00:00:00' ) or j.examplestartdatetime IS NULL ) group by
>>> j.examplestatuscode)
>>> lmp_examples-> select js.examplestatuscode,COALESCE(stat_count,0)
>>> stat_count from exampleCount jc right outer join examplestatus js on
>>> jc.examplestatuscode=js.examplestatuscode;
>>>
>>>
>>> QUERY PLAN
>>>
>>>
>>>
>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> Hash Left Join (cost=79453.94..79455.10 rows=9 width=12) (actual
>>> time=89.660..89.669 rows=9 loops=1)
>>> Hash Cond: ((js.examplestatuscode)::text =
>>> (jc.examplestatuscode)::text)
>>> CTE examplecount
>>> -> HashAggregate (cost=79453.77..79453.81 rows=4 width=12)
>>> (actual time=89.638..89.640 rows=5 loops=1)
>>> Group Key: j.examplestatuscode
>>> -> Bitmap Heap Scan on example j (cost=1547.81..79251.08
>>> rows=40538 width=4) (actual time=18.193..69.710 rows=62851 loops=1)
>>> Recheck Cond: ((((countrycode)::text = 'AD'::text) AND
>>> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
>>> ANY ('{005,006,007,005}'::text[])) AND (examplestartdatetime >= '2020-05-18
>>> 00:00:00'::timestamp without time zone) AND (examplestartdatetime <=
>>> '2020-08-19 00:00:00'::timestamp without time zone)) OR
>>> (examplestartdatetime IS NULL))
>>> Filter: (((countrycode)::text = 'AD'::text) AND
>>> ((facilitycode)::text = 'ABCD'::text) AND ((internalexamplecode)::text =
>>> ANY ('{005,006,007,005}'::text[])))
>>> Rows Removed by Filter: 3
>>> Heap Blocks: exact=18307
>>> -> BitmapOr (cost=1547.81..1547.81 rows=40538
>>> width=0) (actual time=15.483..15.483 rows=0 loops=1)
>>> -> Bitmap Index Scan on example_list9_idx
>>> (cost=0.00..1523.10 rows=40538 width=0) (actual time=15.477..15.478
>>> rows=62851 loops=1)
>>> Index Cond: (((countrycode)::text =
>>> 'AD'::text) AND ((facilitycode)::text = 'ABCD'::text) AND
>>> ((internalexamplecode)::text = ANY ('{005,006,007,005}'::text[])) AND
>>> (examplestartdatetime >= '2020-05-18 00:00:00'::timestamp without time
>>> zone) AND (examplestartdatetime <= '2020-08-19 00:00:00'::timestamp without
>>> time zone))
>>> -> Bitmap Index Scan on example_list10_idx
>>> (cost=0.00..4.44 rows=1 width=0) (actual time=0.004..0.004 rows=3 loops=1)
>>> Index Cond: (examplestartdatetime IS NULL)
>>> -> Seq Scan on examplestatus js (cost=0.00..1.09 rows=9 width=4)
>>> (actual time=0.003..0.005 rows=9 loops=1)
>>> -> Hash (cost=0.08..0.08 rows=4 width=24) (actual
>>> time=89.650..89.651 rows=5 loops=1)
>>> Buckets: 1024 Batches: 1 Memory Usage: 9kB
>>> -> CTE Scan on examplecount jc (cost=0.00..0.08 rows=4
>>> width=24) (actual time=89.641..89.647 rows=5 loops=1)
>>> Planning Time: 0.470 ms
>>> Execution Time: 89.737 ms
>>>
>>> ------------------------exampleSelect-----------------------------------
>>>
>>>
>>> lmp_examples=> explain analyze select j.id from example j where 1=1
>>> and j.countrycode = 'AD' and j.facilitycode in ('ABCD') and
>>> j.examplestatuscode in ('101') and j.internalexamplecode in
>>> ('005','006','007','005') and ((j.examplestartdatetime between '2020-05-18
>>> 00:00:00' and '2020-08-19 00:00:00') or j.examplestartdatetime IS NULL)
>>> ORDER BY createddate DESC limit 10;
>>>
>>> QUERY PLAN
>>>
>>>
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> Limit (cost=71286.65..71286.68 rows=10 width=12) (actual
>>> time=47.351..47.359 rows=10 loops=1)
>>> -> Sort (cost=71286.65..71335.31 rows=19462 width=12) (actual
>>> time=47.349..47.352 rows=10 loops=1)
>>> Sort Key: createddate DESC
>>> Sort Method: top-N heapsort Memory: 25kB
>>> -> Bitmap Heap Scan on example j (cost=1176.77..70866.09
>>> rows=19462 width=12) (actual time=15.133..46.555 rows=2530 loops=1)
>>> Recheck Cond: (((countrycode)::text = 'AD'::text) AND
>>> ((facilitycode)::text = 'ABCD'::text) AND ((examplestatuscode)::text =
>>> '101'::text) AND ((internalexamplecode)::text = ANY
>>> ('{005,006,007,005}'::text[])))
>>> Filter: (((examplestartdatetime >= '2020-05-18
>>> 00:00:00'::timestamp without time zone) AND (examplestartdatetime <=
>>> '2020-08-19 00:00:00'::timestamp without time zone)) OR
>>> (examplestartdatetime IS NULL))
>>> Rows Removed by Filter: 38724
>>> Heap Blocks: exact=20923
>>> -> Bitmap Index Scan on example_list1_idx
>>> (cost=0.00..1171.90 rows=33211 width=0) (actual time=9.938..9.939
>>> rows=41254 loops=1)
>>> Index Cond: (((countrycode)::text = 'AD'::text) AND
>>> ((facilitycode)::text = 'ABCD'::text) AND ((examplestatuscode)::text =
>>> '101'::text) AND ((internalexamplecode)::text = ANY
>>> ('{005,006,007,005}'::text[])))
>>> Planning Time: 0.398 ms
>>> Execution Time: 47.416 ms
>>>
>>> Regards,
>>> Aditya.
>>>
>>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message aditya desai 2020-09-30 07:19:43 SSL connection getting rejected on AWS RDS
Previous Message Timothy Garnett 2020-09-29 00:22:22 Re: Is it possible to specify minimum number of rows planner should consider?