Re: Temporary disabling a replica in a Patroni cluster

From: Victor Sudakov <vas(at)sibptus(dot)ru>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Temporary disabling a replica in a Patroni cluster
Date: 2023-08-26 18:12:22
Message-ID: ZOpAhjcKL5jwCGUg@admin.sibptus.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Georg H. wrote:
> Hello Victor,
>
>
> Am 25.08.2023 um 13:18 schrieb Victor Sudakov:
> > Dear Colleagues,
> >
> > Do you perchance know what is the correct procedure of temporarily
> > taking down a replica in a Patroni cluster, e.g. for 5-10 minutes of
> > hardware maintenance?
> >
> > The problem is that after stopping the patroni process (service) on a
> > replica, patroni removes the corresponding physical replication slot
> > from the leader, and unless the wal_keep_size value is unsanely high,
> > the replica, when up again, cannot restart streaming because the WAL
> > segments are already gone from the leader.
> >
> > Well, you all know:
> > <%%%>LOG: started streaming WAL from primary at B4A0/E2000000 on timeline 8
> > <%%%>FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000080000B4A0000000E2 has already been removed
> > <%%%>LOG: waiting for WAL to become available at B4A0/E2002000
> >
> > Do you think there is a way to tell Patroni that a replica is down
> > temporarily and its replication slot should not be removed?
> >
> > Or, what am I missing?
>
>
> you may use patronictl pause + resume

I would like to do the maintenance on one node only and keep the rest
of the cluster functioning normally.

>
> keep in mind to set wal_keep_size (or wal_keep_segments depending on
> your PG version high enough)

I have written above about "unless the wal_keep_size value is unsanely high" :-)

Keeping wal_keep_size very high is a waste of disk space and still
provides no real guarantee, unfortunately. Why does Patroni use slots
at all then?

--
Victor Sudakov VAS4-RIPE
http://vas.tomsk.ru/
2:5005/49(at)fidonet

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Rajesh Kumar 2023-08-27 11:08:32 Autovacuum not running properly
Previous Message M Sarwar 2023-08-25 22:26:28 Re: dblink without running the full query remotely?