| From: | Alexander Kukushkin <cyberdemn(at)gmail(dot)com> |
|---|---|
| To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
| Cc: | michael(at)paquier(dot)xyz, pgsql-hackers(at)postgresql(dot)org, thomas(dot)munro(at)gmail(dot)com |
| Subject: | Re: Infinite loop in XLogPageRead() on standby |
| Date: | 2024-03-15 07:20:15 |
| Message-ID: | CAFh8B=kW0SyFmmXLXTdkgKYrkSVVCZAgRqW6zSoz8L+NPwtwJQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Kyotaro,
On Wed, 13 Mar 2024 at 03:56, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
wrote:
I identified the cause of the second issue. When I tried to replay the
> issue, the second standby accidentally received the old timeline's
> last page-spanning record till the end while the first standby was
> promoting (but it had not been read by recovery). In addition to that,
> on the second standby, there's a time window where the timeline
> increased but the first segment of the new timeline is not available
> yet. In this case, the second standby successfully reads the
> page-spanning record in the old timeline even after the second standby
> noticed that the timeline ID has been increased, thanks to the
> robustness of XLogFileReadAnyTLI().
>
Hmm, I don't think it could really be prevented.
There are always chances that the standby that is not ahead of other
standbys could be promoted due to reasons like:
1. HA configuration doesn't let certain nodes to be promoted.
2. This is an async standby (name isn't listed in
synchronous_standby_names) and it was ahead of promoted sync standby. No
data loss from the client point of view.
> Of course, regardless of the changes above, if recovery on the second
> standby had reached the end of the page-spanning record before
> redirection to the first standby, it would need pg_rewind to connect
> to the first standby.
>
Correct, IMO pg_rewind is a right way of solving it.
Regards,
--
Alexander Kukushkin
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kyotaro Horiguchi | 2024-03-15 07:20:27 | Re: Inconsistent printf placeholders |
| Previous Message | shveta malik | 2024-03-15 07:19:07 | Re: Introduce XID age and inactive timeout based replication slot invalidation |