From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: doc: Mention clock synchronization recommendation for hot_standby_feedback |
Date: | 2024-12-18 09:33:33 |
Message-ID: | CAA4eK1JG1R4c7DDEdr7QAiQ1sFjb-EkQmp1H=dSKguoKX7PZDg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Dec 5, 2024 at 3:14 PM Jakub Wartak
<jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
>
> One of our customers ran into a very odd case, where hot standby feedback backend_xmin propagation stopped working due to major (hours/days) clock time shifts on hypervisor-managed VMs. This happens (and is fully reproducible) e.g. in scenarios where standby connects and its own VM is having time from the future (relative to primary) and then that time goes back to "normal". In such situation "sends hot_standby_feedback xmin" timestamp messages are stopped being transferred, e.g.:
>
> 2024-12-05 02:02:35 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
> 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending write 6/E9015230 flush 6/E9015230 apply 6/E9015230
> 2024-12-05 02:02:45 UTC [6002]: db=,user=,app=,client= DEBUG: sending hot standby feedback xmin 1614031 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
> <-- clock readjustment and no further "sending hot standby feedback"
...
>
> I can share reproduction steps if anyone is interested. This basically happens due to usage of TimestampDifferenceExceeds() in XLogWalRcvSendHSFeedback(), but I bet there are other similiar scenarios.
>
We started to use a different mechanism in HEAD. See XLogWalRcvSendHSFeedback().
> What I was kind of surprised about was the lack of recommendation for having primary/standby to have clocks synced when using hot_standby_feedback, but such a thing is mentioned for recovery_min_apply_delay. So I would like to add at least one sentence to hot_standby_feedback to warn about this too, patch attached.
>
IIUC, this issue doesn't occur because the primary and standby clocks
are not synchronized. It happened because the clock on standby moved
backward. This is quite unlike the 'recovery_min_apply_delay' where
non-synchronization of clocks between primary and standby can lead to
unexpected results. This is because we don't compare any time on the
primary with the time on standby. If this understanding is correct
then the wording proposed by your patch should be changed accordingly.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2024-12-18 09:42:37 | Re: pure parsers and reentrant scanners |
Previous Message | Richard Guo | 2024-12-18 09:29:41 | Re: Pg18 Recursive Crash |