Re: Trouble using pg_rewind to undo standby promotion

From: Craig McIlwee <craigm(at)vt(dot)edu>
To: Torsten Förtsch <tfoertsch123(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Trouble using pg_rewind to undo standby promotion
Date: 2024-11-07 13:26:46
Message-ID: CAGqBcTZKSYTuVmf6ppR=GKYPtgKKOp6DASaP6YZYUAks49EHoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Nov 7, 2024 at 4:47 AM Torsten Förtsch <tfoertsch123(at)gmail(dot)com>
wrote:

> Your point of divergence is in the middle of the 7718/000000BF file. So,
> you should have 2 such files eventually, one on timeline 1 and the other on
> timeline 2.
>
> Are you archiving WAL on the promoted machine in a way that your
> restore_command can find it? Check archive_command and archive_mode on the
> promoted machine.
>

No, the promoted machine is not archiving. How should that work? Is it OK
for a log shipping standby that uses restore_command to also push to the
same directory with an archive_command or would that cause issues of trying
to read and write the same file simultaneously during WAL replay? Or
should I be setting up an archive_command that pushes to a separate
directory and have a restore_command that knows to check both locations?

Hmm, as I write that out, I realize that I could use archive_mode = on
instead of archive_mode = always to avoid the potential for read/write
conflicts during WAL replay. I can try this later and report back.

Also, do your archive/restore scripts work properly for history files?
>

The scripts don't do anything special with history files. They are based
on the continuous archive docs [1] and this [2] article the with slight
modification to include a throttled scp since the log shipping server is
located in a different data center from the promoted standby and there is
limited bandwidth between the two. (Also note that the archive script from
[2] is adapted to properly handle file transfer failures - the one in the
article will use the exit code of the rm command so postgres won't be
informed the file transfer fails resulting in missing WAL in the archive.)

Archive script:
---
#!/bin/bash

# $1 = %p
# $2 = %f

limit=10240 # 10Mbps

gzip < /var/lib/pgsql/13/data/$1 > /tmp/archive/$2.gz

scp -l $limit /tmp/archive/$2.gz postgres(at)x(dot)x(dot)x(dot)x
:/data/wal_archive/operational/$2.gz
exit_code=$?

rm /tmp/archive/$2.gz

exit $exit_code
---

Restore script:
---
gunzip < /data/wal_archive/operational/$2.gz > $1
---

[1]
https://www.postgresql.org/docs/13/continuous-archiving.html#COMPRESSED-ARCHIVE-LOGS
[2]
https://www.rockdata.net/tutorial/admin-archive-command/#compressing-and-archiving

Craig

>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2024-11-07 16:37:35 Re: About the stability of COPY BINARY data
Previous Message Torsten Förtsch 2024-11-07 09:46:49 Re: Trouble using pg_rewind to undo standby promotion