From: | Tatsuro Yamada <tatsuro(dot)yamada(dot)tf(at)nttcom(dot)co(dot)jp> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Duplicate history file? |
Date: | 2021-06-01 04:03:22 |
Message-ID: | 9bd1cc76-5fb8-6954-dce2-ab8ca56642ef@nttcom.co.jp_1 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi Horiguchi-san,
On 2021/05/31 16:58, Kyotaro Horiguchi wrote:
> So, I started a thread for this topic diverged from the following
> thread.
>
> https://www.postgresql.org/message-id/4698027d-5c0d-098f-9a8e-8cf09e36a555@nttcom.co.jp_1
>
>> So, what should we do for the user? I think we should put some notes
>> in postgresql.conf or in the documentation. For example, something
>> like this:
>
> I'm not sure about the exact configuration you have in mind, but that
> would happen on the cascaded standby in the case where the upstream
> promotes. In this case, the history file for the new timeline is
> archived twice. walreceiver triggers archiving of the new history
> file at the time of the promotion, then startup does the same when it
> restores the file from archive. Is it what you complained about?
Thank you for creating a new thread and explaining this.
We are not using cascade replication in our environment, but I think
the situation is similar. As an overview, when I do a promote,
the archive_command fails due to the history file.
I've created a reproduction script that includes building replication,
and I'll share it with you. (I used Robert's test.sh as a reference
for creating the reproduction script. Thanks)
The scenario (sr_test_historyfile.sh) is as follows.
#1 Start pgprimary as a main
#2 Create standby
#3 Start pgstandby as a standby
#4 Execute archive command
#5 Shutdown pgprimary
#6 Start pgprimary as a standby
#7 Promote pgprimary
#8 Execute archive_command again, but failed since duplicate history
file exists (see pgstandby.log)
Note that this may not be appropriate if you consider it as a recovery
procedure for replication configuration. However, I'm sharing it as it is
because this seems to be the procedure used in the customer's environment (PG-REX).
> The same workaround using the alternative archive script works for the
> case.
>
> We could check pg_wal before fetching archive, however, archiving is
> not controlled so strictly that duplicate archiving never happens and
> I think we choose possible duplicate archiving than having holes in
> archive. (so we suggest the "test ! -f" script)
>
>> ====
>> Note: If you use archive_mode=always, the archive_command on the
>> standby side should not be used "test ! -f".
>> ====
>
> It could be one workaround. However, I would suggest not to overwrite
> existing files (with a file with different content) to protect archive
> from corruption.
>
> We might need to write that in the documentation...
I think you're right, replacing it with an alternative archive script
that includes the cmp command will resolve the error. The reason is that
I checked with the diff command that the history files are identical.
=====
$ diff -s pgprimary/arc/00000002.history pgstandby/arc/00000002.history
Files pgprimary/arc/00000002.history and pgstandby/arc/00000002.history are identical
=====
Regarding "test ! -f",
I am wondering how many people are using the test command for
archive_command. If I remember correctly, the guide provided by
NTT OSS Center that we are using does not recommend using the test command.
Regards,
Tatsuro Yamada
Attachment | Content-Type | Size |
---|---|---|
pgprimary.log | text/plain | 3.0 KB |
pgstandby.log | text/plain | 7.0 KB |
sr_test_historyfile.sh | text/plain | 2.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2021-06-01 04:16:35 | Re: AWS forcing PG upgrade from v9.6 a disaster |
Previous Message | Amit Kapila | 2021-06-01 04:01:33 | Re: Skipping logical replication transactions on subscriber side |