Re: long-standing data loss bug in initial sync of logical replication

From: Nitin Motiani <nitinmotiani(at)google(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: long-standing data loss bug in initial sync of logical replication
Date: 2024-07-11 12:49:36
Message-ID: CAH5HC96rdZm7Q=a3g_WZEJTvbnaxXE+4BPBpY1jw6XrxHUG3PA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 10, 2024 at 11:22 PM Nitin Motiani <nitinmotiani(at)google(dot)com> wrote:
>
> On Wed, Jul 10, 2024 at 10:39 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Wed, 10 Jul 2024 at 12:28, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > The patch missed to use the ShareRowExclusiveLock for partitions, see
> > > attached. I haven't tested it but they should also face the same
> > > problem. Apart from that, I have changed the comments in a few places
> > > in the patch.
> >
> > I could not hit the updated ShareRowExclusiveLock changes through the
> > partition table, instead I could verify it using the inheritance
> > table. Added a test for the same and also attaching the backbranch
> > patch.
> >
>
> Hi,
>
> I tested alternative-experimental-fix-lock.patch provided by Tomas
> (replaces SUE with SRE in OpenTableList). I believe there are a couple
> of scenarios the patch does not cover.
>
> 1. It doesn't handle the case of "ALTER PUBLICATION <pub> ADD TABLES
> IN SCHEMA <schema>".
>
> I took crash-test.sh provided by Tomas and modified it to add all
> tables in the schema to publication using the following command :
>
> ALTER PUBLICATION p ADD TABLES IN SCHEMA public
>
> The modified script is attached (crash-test-with-schema.sh). With this
> script, I can reproduce the issue even with the patch applied. This is
> because the code path to add a schema to the publication doesn't go
> through OpenTableList.
>
> I have also attached a script run-test-with-schema.sh to run
> crash-test-with-schema.sh in a loop with randomly generated parameters
> (modified from run.sh provided by Tomas).
>
> 2. The second issue is a deadlock which happens when the alter
> publication command is run for a comma separated list of tables.
>
> I created another script create-test-tables-order-reverse.sh. This
> script runs a command like the following :
>
> ALTER PUBLICATION p ADD TABLE test_2,test_1
>
> Running the above script, I was able to get a deadlock error (the
> output is attached in deadlock.txt). In the alter publication command,
> I added the tables in the reverse order to increase the probability of
> the deadlock. But it should happen with any order of tables.
>
> I am not sure if the deadlock is a major issue because detecting the
> deadlock is better than data loss. The schema issue is probably more
> important. I didn't test it out with the latest patches sent by
> Vignesh but since the code changes in that patch are also in
> OpenTableList, I think the schema scenario won't be covered by those.
>

Hi,

I looked further into the scenario of adding the tables in schema to
the publication. Since in that case, the entry is added to
pg_publication_namespace instead of pg_publication_rel, the codepaths
for 'add table' and 'add tables in schema' are different. And in the
'add tables in schema' scenario, the OpenTableList function is not
called to get the relation ids. Therefore even with the proposed
patch, the data loss issue still persists in that case.

To validate this idea, I tried locking all the affected tables in the
schema just before the invalidation for those relations (in
ShareRowExclusiveLock mode). I am attaching the small patch for that
(alter_pub_for_schema.patch) where the change is made in the function
publication_add_schema in pg_publication.c. I am not sure if this is
the best place to make this change or if it is the right fix. It is
conceptually similar to the proposed change in OpenTableList but here
we are not just changing the lockmode but taking locks which were not
taken before. But with this change, the data loss errors went away in
my test script.

Another issue which persists with this change is the deadlock. Since
multiple table locks are acquired, the test script detects deadlock a
few times. Therefore I'm also attaching another modified script which
does a few retries in case of deadlock. The script is
crash-test-with-retries-for-schema.sh. It runs the following command
in a retry loop :

ALTER PUBLICATION p ADD TABLES IN SCHEMA public

If the command fails, it sleeps for a random amount of time (upper
bound by a MAXWAIT parameter) and then retries the command. If it
fails to run the command in the max number of retries, the final
return value from the script is DEADLOCK as we can't do a consistency
check in this scenario. Also attached is another script
run-with-deadlock-detection.sh which can run the above script for
multiple iterations.

I tried the test scripts with and without alter_pub_for_schema.patch.
Without the patch, I get the final output ERROR majority of the time
which means that the publication was altered successfully but the data
was lost on the subscriber. When I run it with the patch, I get a mix
of OK (no data loss) and DEADLOCK (the publication was not altered)
but no ERROR. I think by changing the parameters of sleep time and
number of retries we can get different fractions of OK and DEADLOCK.

I am not sure if this is the right or a clean way to fix the issue but
I think conceptually this might be the right direction. Please let me
know if my understanding is wrong or if I'm missing something.

Thanks & Regards,
Nitin Motiani
Google

Attachment Content-Type Size
alter_pub_for_schema.patch application/octet-stream 1017 bytes
run-with-deadlock-detection.sh text/x-sh 1.3 KB
crash-test-with-retries-for-schema.sh text/x-sh 9.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-07-11 12:50:40 Re: Built-in CTYPE provider
Previous Message Justin Pryzby 2024-07-11 12:31:25 Re: CREATE INDEX CONCURRENTLY on partitioned index