Quick Links

pg_rewind with cascade standby doesn't work well

From:	Kuwamura Masaki <kuwamura(at)db(dot)is(dot)i(dot)nagoya-u(dot)ac(dot)jp>
To:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	pg_rewind with cascade standby doesn't work well
Date:	2023-09-07 06:33:45
Message-ID:	CAMyC8qqnxBVjAM+a5WgQ+bFSvd2ZoA_wuzEDo-y8_jbLCtHbjQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi there,

I tested pg_rewind behavior and found a suspicious one.

Consider a scenario like this,

Server A: primary
Server B :replica of A
Server C :replica of B

and somehow A down ,so B gets promoted.

Server A: down
Server B :new primary
Server C :replica of B

In this case, pg_rewind can be used to reconstruct the cascade; the source
is C and the target is A.
However, we get error as belows by running pg_rewind.

```
pg_rewind: fetched file "global/pg_control", length 8192
pg_rewind: source and target cluster are on the same timeline
pg_rewind: no rewind required
```
Though A's timeline is 1 and C's is 2 ideally, it says they're on the same
timeline.

This is because `pg_rewind` currently uses minRecoveryPointTLI and latest
checkpoint's TimelineID to compare the TLI between source and target[1].
Both C's minRecoveryPointTLI and Latest checkpoint's TimelineID are not
modified until checkpointing. (even though B's are modified).
And then, if you run pg_rewind immediately, pg_rewind won't work because C
and A appear to be on the same timeline. So we have to CHECKPOINT on C
before running pg_rewind;

BTW, immediate pg_rewind with cascade standby seems to be already concerned
in another discussion[2], but unfortunately missed.

Anyway, I don't think this behavior is kind.
To fix this, should we use another variable to compare TLI?
Or, modify the cascade standby's minRecoveryPointTLI somehow?

Masaki Kuwamura

[1]
https://www.postgresql.org/message-id/flat/9f568c97-87fe-a716-bd39-65299b8a60f4%40iki.fi
[2]
https://www.postgresql.org/message-id/flat/aeb5f31a-8de2-40a8-64af-ab659a309d6b%40iki.fi

Responses

Re: pg_rewind with cascade standby doesn't work well at 2023-09-11 08:49:46 from Kuwamura Masaki

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2023-09-07 06:33:58	Re: pg_upgrade and logical replication
Previous Message	Amit Kapila	2023-09-07 06:26:28	Re: persist logical slots to disk during shutdown checkpoint