From: | Yi Sun <yinan81(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: received immediate shutdown request caused cluster failover |
Date: | 2020-11-20 09:53:38 |
Message-ID: | CABWY_HAM1HNAPQcRMzOZeOdjEMV9pcPwy_PE0bvG+Xha6u8YkQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hello,
Thank you for your reply
Patroni replied this:
"It seems your system is under so much stress that there was no resources
for Patroni to execute HA loop for 35 seconds.
This interval exceeds ttl=30s, therefore the leader key expired, Patroni
noticed it and demoted Postgres.
You need to figure out what is going on with your system, and what is the
reason for cpu/memory pressure. Ideally fix these issues."
As company hundreds of clusters use ansible deployments use same
parameters, change parameters for 1 cluster is difficult
I just think maybe can get top sql from pg_stat_statements as below then
analyse and tuning
Is it correct direction? Any suggestions please, thanks
1 time IO SQL TOP 5
select userid::regrole, dbid, query from pg_stat_statements order by
(blk_read_time+blk_write_time)/calls desc limit 5;
total IO SQL TOP 5
select userid::regrole, dbid, query from pg_stat_statements order by
(blk_read_time+blk_write_time) desc limit 5;
1 time long SQL TOP 5
select userid::regrole, dbid, query from pg_stat_statements order by
mean_time desc limit 5;
total time long SQL TOP 5
select userid::regrole, dbid, query from pg_stat_statements order by
total_time desc limit 5;
average time long SQL TOP 5
select calls, total_time/calls AS avg_time, left(query,80) from
pg_stat_statements order by 2 desc limit 5;
stddev time SQL
select userid::regrole, dbid, query from pg_stat_statements order by
stddev_time desc limit 5;
share block SQL
select userid::regrole, dbid, query from pg_stat_statements order by
(shared_blks_hit+shared_blks_dirtied) desc limit 5;
temp blk SQL
select userid::regrole, dbid, query from pg_stat_statements order by
temp_blks_written desc limit 5;
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> 于2020年11月20日周五 下午2:17写道:
> Yi Sun <yinan81(at)gmail(dot)com> writes:
> > Besides command run(like pg_ctl) can cause "received immediate shutdown
> > request" any other reason can cause this please?
>
> That message indicates that something sent the postmaster process a
> SIGQUIT signal (which is all that "pg_ctl stop -m immediate" does).
> There's no speculation to that: a look at postmaster.c will convince
> you that there is no other way to reach that message. So you need
> to be looking for things that would be sending SIGQUIT unexpectedly.
>
> I don't know much about Patroni, but maybe something in that
> environment thinks that SIGQUIT'ing random processes is a good
> thing to do.
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Paul Förster | 2020-11-20 10:02:20 | Re: Determine if postgresql cluster running is primary or not |
Previous Message | Raul Kaubi | 2020-11-20 09:41:34 | Re: Determine if postgresql cluster running is primary or not |