Re: Identify root-cause for intermittent spikes

From: Rick Otten <rottenwindfish(at)gmail(dot)com>
To: Sengottaiyan T <techsenko(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Identify root-cause for intermittent spikes
Date: 2022-10-12 00:42:19
Message-ID: CAMAYy4Kt7kxgQ80jOyp6LLcdaEkHLy7=em2PW6iDfu+JX008Gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I like to use pgbadger to collect data on what is happening in RDS
instances. You have to turn up a bunch of logging in RDS:

1. Turn on connection logging, duration logging, lock_waits, and anything
else that you are interested in studying.

2. Then grab all of your postgresql logs from AWS. I wrote this little
bash script to pull all of the logs for a current day. It will work if you
have your aws credentials configured correctly and can run aws-cli commands.
```
#!/bin/env bash

## Return all of the postgresql log files saved by RDS since midnight.
## Save them in your current directory.
## This is so we can use cli tools like "grep"
## It is also really handy for feeding into pgbadger for deeper analysis.

# aws requires the timestamp to be in milliseconds.
# unfortunately date will provide either seconds or nano seconds, so we
have to do math.
midnight_timestamp=$(date -d $(date -I) '+%s')
midnight_timestamp_milliseconds=$(echo "${midnight_timestamp} * 1000" | bc)

logfiles=$(aws rds describe-db-log-files \
--profile default \
--db-instance-identifier "*some_rds_instance_name*" \
--output json \
--file-last-written ${midnight_timestamp_milliseconds} | jq
-r ".DescribeDBLogFiles[].LogFileName")

for logfile in $(echo ${logfiles})
do
# remove the leading "error/" so we can use the name to save it.
logfile_save=$(echo "${logfile}" | awk -F\/ '{print $NF}')

tput bold; echo "${logfile}"; tput sgr0
aws rds download-db-log-file-portion \
--profile admin \
--db-instance-identifier prod-notify-me-1 \
--log-file-name ${logfile} \
--output text \
--no-paginate > ${logfile_save}
done
```
3. Then run pgbadger:
``` ~/src/pgbadger/pgbadger -f rds postgresql*
```
4. Open the `out.html` in your browser, and poke around. There is a ton
of stuff you can find in all the drop down menus about what was happening
in your database over the time window you collected the logs for. The html
is generated as a standalone file by a perl script of all things. It is
pretty impressive.

On Tue, Oct 11, 2022 at 7:07 AM Sengottaiyan T <techsenko(at)gmail(dot)com> wrote:

> Hi All,
>>
>> I'm looking for suggestions:
>>
>> Environment: AWS PostgreSQL RDS instance - Version 14.3
>> Operations support gets intermittent alerts from the monitoring tool
>> through AWS cloud watch metrics on Disk Queue Depth, CPU burst-credit & CPU
>> Utilization.
>> I would like to understand what is causing the spike - is the number of
>> logon's increased, (or) number of transactions per second increased, (or)
>> SQL execution picked wrong plan and the long running (I/O, CPU or memory
>> intensive) SQL is increasing load on server (cause and effect scenario)
>> etc.,
>>
>> Due to the reactive nature of the issues, we rely on the metrics gathered
>> in the AWS cloud watch monitoring (for the underlying OS stats),
>> Performance Insights (for the DB performance) and correlate SQL queries
>> with pg_Stat_Statements view. But the data in the view is an aggregated
>> stats. And, I'm looking to see the deltas compared to normal runs.
>> How should I approach and get to the root-cause?
>>
>> AppDynamics is already configured for the RDS instance. Are there any
>> open source monitoring tools available which would help to capture and
>> visualize the deltas?
>>
>> Thanks,
>> Senko
>>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message SAMEER KUMAR 2022-10-12 06:50:31 Re: Identify root-cause for intermittent spikes
Previous Message MichaelDBA 2022-10-11 11:48:54 Re: Identify root-cause for intermittent spikes