From: | Nitin Motiani <nitinmotiani(at)google(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: long-standing data loss bug in initial sync of logical replication |
Date: | 2024-07-10 17:52:36 |
Message-ID: | CAH5HC95UsGrD1ez0vfNPCLP6vE3ptmaOkUECB5JsqLOCYjRW3w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 10, 2024 at 10:39 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Wed, 10 Jul 2024 at 12:28, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > The patch missed to use the ShareRowExclusiveLock for partitions, see
> > attached. I haven't tested it but they should also face the same
> > problem. Apart from that, I have changed the comments in a few places
> > in the patch.
>
> I could not hit the updated ShareRowExclusiveLock changes through the
> partition table, instead I could verify it using the inheritance
> table. Added a test for the same and also attaching the backbranch
> patch.
>
Hi,
I tested alternative-experimental-fix-lock.patch provided by Tomas
(replaces SUE with SRE in OpenTableList). I believe there are a couple
of scenarios the patch does not cover.
1. It doesn't handle the case of "ALTER PUBLICATION <pub> ADD TABLES
IN SCHEMA <schema>".
I took crash-test.sh provided by Tomas and modified it to add all
tables in the schema to publication using the following command :
ALTER PUBLICATION p ADD TABLES IN SCHEMA public
The modified script is attached (crash-test-with-schema.sh). With this
script, I can reproduce the issue even with the patch applied. This is
because the code path to add a schema to the publication doesn't go
through OpenTableList.
I have also attached a script run-test-with-schema.sh to run
crash-test-with-schema.sh in a loop with randomly generated parameters
(modified from run.sh provided by Tomas).
2. The second issue is a deadlock which happens when the alter
publication command is run for a comma separated list of tables.
I created another script create-test-tables-order-reverse.sh. This
script runs a command like the following :
ALTER PUBLICATION p ADD TABLE test_2,test_1
Running the above script, I was able to get a deadlock error (the
output is attached in deadlock.txt). In the alter publication command,
I added the tables in the reverse order to increase the probability of
the deadlock. But it should happen with any order of tables.
I am not sure if the deadlock is a major issue because detecting the
deadlock is better than data loss. The schema issue is probably more
important. I didn't test it out with the latest patches sent by
Vignesh but since the code changes in that patch are also in
OpenTableList, I think the schema scenario won't be covered by those.
Thanks & Regards,
Nitin Motiani
Google
Attachment | Content-Type | Size |
---|---|---|
deadlock.txt | text/plain | 6.8 KB |
crash-test-with-schema.sh | application/x-sh | 9.1 KB |
run-test-with-schema.sh | application/x-sh | 1.2 KB |
crash-test-tables-order-reverse.sh | application/x-sh | 9.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kirill Reshke | 2024-07-10 17:57:45 | Re: Allow non-superuser to cancel superuser tasks. |
Previous Message | Imran Zaheer | 2024-07-10 17:06:40 | Re: Windows: openssl & gssapi dislike each other |