very long secondary->primary switch time

From: Tomas Pospisek <tpo2(at)sourcepole(dot)ch>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: very long secondary->primary switch time
Date: 2021-04-27 17:15:03
Message-ID: 5064b6f7-e4bd-d550-482a-f0bdd527f3d9@sourcepole.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello all,

I maintain a postgresql cluster that does failover via patroni. The
problem is that after a failover happens it takes the secondary too long
(that is about 35min) to come up and answer queries. The log of the
secondary looks like this:

04:00:29.777 [9679] LOG: received promote request
04:00:29.780 [9693] FATAL: terminating walreceiver process due to
administrator command
04:00:29.780 [9679] LOG: invalid record length at 320/B95A1EE0: wanted
24, got 0
04:00:29.783 [9679] LOG: redo done at 320/B95A1EA8
04:00:29.783 [9679] LOG: last completed transaction was at log time
2021-03-03 03:57:46.466342+01

04:35:00.982 [9679] LOG: selected new timeline ID: 15
04:35:01.404 [9679] LOG: archive recovery complete
04:35:02.337 [9662] LOG: database system is ready to accept connections

The cluster is "fairly large" with thousands of DBs (sic!) and ~1TB of data.

I would like to shorten the failover/startup time drastically. Why does
it take the secondary that much time to switch to the primary state?
There are no logs between 04:00 and 04:35. What is postgresql doing
during those 35min?

I am *guessing* that postgresql *might* be doing some consistency check
or replaying the WAL (max_wal_size: 16 GB, wal_keep_segments: 100). I am
also *guessing* that startup time *might* have to do with the size of
the data (~1T) or/and with the numbers of DBs (thousands). If that would
be the case, then splitting the cluster into multiple clusters should
allow for faster startup times?

I have tried to duckduck why the secondary takes that much time to
switch to primary mode, but have failed to find information that would
enlighten me. So any pointers to information, hints or help are very
wellcome.

Thanks & greets,
*t

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vijaykumar Jain 2021-04-27 17:46:48 Re: -1/0 virtualtransaction
Previous Message Stephen Frost 2021-04-27 16:05:07 Re: Approach to creating users in Database