RE: Fix 035_standby_logical_decoding.pl race conditions

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Amit Kapila' <amit(dot)kapila16(at)gmail(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Fix 035_standby_logical_decoding.pl race conditions
Date: 2025-03-28 09:02:29
Message-ID: OSCPR01MB1496683CB3BD0B4ACD1BADAAFF5A02@OSCPR01MB14966.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Amit,

>
> Right, I think this is a better idea. I have a few comments:
> 1.
> + /*
> + * In 035_standby_logical_decoding.pl, RUNNING_XACTS could move slots's
> + * xmin forward and cause random failures.
>
> No need to use test file name in code comments.

Fixed.

> 2. The comments atop wait_until_vacuum_can_remove can be changed to
> indicate that we will avoid logging running_xact with the help of
> injection points.

Comments were updated for the master. In back-branches, they were removed
because the risk was removed.

> 3.
> + # Note that from this point the checkpointer and bgwriter will wait before
> + # they write RUNNING_XACT record.
> + $node_primary->safe_psql('testdb',
> + "SELECT injection_points_attach('log-running-xacts', 'wait');");
>
> Isn't it better to use 'error' as the second parameter as we don't
> want to wait at this injection point?

Right, and the comment atop it was updated.

> 4.
> + # XXX If the instance does not attach 'log-running-xacts', the bgwriter
> + # pocess would generate RUNNING_XACTS record, so that the test would fail.
> + sleep(20);
>
> I think it is better to make a separate patch (as a first patch) for
> this so that it can be used as a reproducer. I suggest to use
> checkpoint as used by one of Bertrand's patches to ensure that the
> issue reproduces in every environment.

Reproducer was separated to the .txt file.

> > Sadly IS_INJECTION_POINT_ATTACHED() was introduced for PG18 so that the
> patch
> > could not backport for PG17 as-is.
> >
>
> We can use 'wait' mode API in PG17 as used in one of the tests
> (injection_points_attach('heap_update-before-pin', 'wait');) but I
> think it may be better to just leave testing active slots in
> backbranches because anyway the new development happens on HEAD and we
> want to ensure that no breakage happens there.

OK. I've attached a patch for PG17 as well. Commit messages for them were also
updated.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
reproducer.txt text/plain 637 bytes
PG17-v2-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch application/octet-stream 16.8 KB
PG16-v2-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch application/octet-stream 15.0 KB
v2-0001-Stabilize-035_standby_logical_decoding.pl-by-usin.patch application/octet-stream 6.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2025-03-28 09:05:48 Re: Assert failure in base_yyparse
Previous Message Amul Sul 2025-03-28 09:00:39 Re: NOT ENFORCED constraint feature