From: | Michail Nikolaev <michail(dot)nikolaev(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Data loss on logical replication, 12.12 to 14.5, ALTER SUBSCRIPTION |
Date: | 2023-01-03 08:43:54 |
Message-ID: | CANtu0ogdMKQ-qj7U8qdCzw+YhOcdTLoLRa5evrdahkrwjSDMiA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello, Amid.
> The point which is not completely clear from your description is the
> timing of missing records. In one of your previous emails, you seem to
> have indicated that the data missed from Table B is from the time when
> the initial sync for Table B was in-progress, right? Also, from your
> description, it seems there is no error or restart that happened
> during the time of initial sync for Table B. Is that understanding
> correct?
Yes and yes.
* B sync started - 08:08:34
* lost records are created - 09:49:xx
* B initial sync finished - 10:19:08
* I/O error with WAL - 10:19:22
* SIGTERM - 10:35:20
"Finished" here is `logical replication table synchronization worker
for subscription "cloud_production_main_sub_v4", table "B" has
finished`.
As far as I know, it is about COPY command.
> I am not able to see how these steps can lead to the problem.
One idea I have here - it is something related to the patch about
forbidding of canceling queries while waiting for synchronous
replication acknowledgement [1].
It is applied to Postgres in the cloud we were using [2]. We started
to see such errors in 10:24:18:
`The COMMIT record has already flushed to WAL locally and might
not have been replicated to the standby. We must wait here.`
I wonder could it be some tricky race because of downtime of
synchronous replica and queries stuck waiting for ACK forever?
> If the problem is reproducible at your end, you might want to increase LOG
> verbosity to DEBUG1 and see if there is additional information in the
> LOGs that can help or it would be really good if there is a
> self-sufficient test to reproduce it.
Unfortunately, it looks like it is really hard to reproduce.
Best regards,
Michail.
[1]: https://www.postgresql.org/message-id/flat/CALj2ACU%3DnzEb_dEfoLqez5CLcwvx1GhkdfYRNX%2BA4NDRbjYdBg%40mail.gmail.com#8b7ffc8cdecb89de43c0701b4b6b5142
[2]: https://www.postgresql.org/message-id/flat/CAAhFRxgcBy-UCvyJ1ZZ1UKf4Owrx4J2X1F4tN_FD%3Dfh5wZgdkw%40mail.gmail.com#9c71a85cb6009eb60d0361de82772a50
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Borisov | 2023-01-03 08:50:23 | Re: Allow placeholders in ALTER ROLE w/o superuser |
Previous Message | Michael Paquier | 2023-01-03 08:41:58 | Re: typos |