| From: | Nitin Motiani <nitinmotiani(at)google(dot)com> |
|---|---|
| To: | vignesh C <vignesh21(at)gmail(dot)com> |
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: long-standing data loss bug in initial sync of logical replication |
| Date: | 2024-07-10 17:52:36 |
| Message-ID: | CAH5HC95UsGrD1ez0vfNPCLP6vE3ptmaOkUECB5JsqLOCYjRW3w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Wed, Jul 10, 2024 at 10:39 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Wed, 10 Jul 2024 at 12:28, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > The patch missed to use the ShareRowExclusiveLock for partitions, see
> > attached. I haven't tested it but they should also face the same
> > problem. Apart from that, I have changed the comments in a few places
> > in the patch.
>
> I could not hit the updated ShareRowExclusiveLock changes through the
> partition table, instead I could verify it using the inheritance
> table. Added a test for the same and also attaching the backbranch
> patch.
>
Hi,
I tested alternative-experimental-fix-lock.patch provided by Tomas
(replaces SUE with SRE in OpenTableList). I believe there are a couple
of scenarios the patch does not cover.
1. It doesn't handle the case of "ALTER PUBLICATION <pub> ADD TABLES
IN SCHEMA <schema>".
I took crash-test.sh provided by Tomas and modified it to add all
tables in the schema to publication using the following command :
ALTER PUBLICATION p ADD TABLES IN SCHEMA public
The modified script is attached (crash-test-with-schema.sh). With this
script, I can reproduce the issue even with the patch applied. This is
because the code path to add a schema to the publication doesn't go
through OpenTableList.
I have also attached a script run-test-with-schema.sh to run
crash-test-with-schema.sh in a loop with randomly generated parameters
(modified from run.sh provided by Tomas).
2. The second issue is a deadlock which happens when the alter
publication command is run for a comma separated list of tables.
I created another script create-test-tables-order-reverse.sh. This
script runs a command like the following :
ALTER PUBLICATION p ADD TABLE test_2,test_1
Running the above script, I was able to get a deadlock error (the
output is attached in deadlock.txt). In the alter publication command,
I added the tables in the reverse order to increase the probability of
the deadlock. But it should happen with any order of tables.
I am not sure if the deadlock is a major issue because detecting the
deadlock is better than data loss. The schema issue is probably more
important. I didn't test it out with the latest patches sent by
Vignesh but since the code changes in that patch are also in
OpenTableList, I think the schema scenario won't be covered by those.
Thanks & Regards,
Nitin Motiani
Google
| Attachment | Content-Type | Size |
|---|---|---|
| deadlock.txt | text/plain | 6.8 KB |
| crash-test-with-schema.sh | application/x-sh | 9.1 KB |
| run-test-with-schema.sh | application/x-sh | 1.2 KB |
| crash-test-tables-order-reverse.sh | application/x-sh | 9.1 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kirill Reshke | 2024-07-10 17:57:45 | Re: Allow non-superuser to cancel superuser tasks. |
| Previous Message | Imran Zaheer | 2024-07-10 17:06:40 | Re: Windows: openssl & gssapi dislike each other |