pgsql: Fix another race-condition-ish issue in recovery/t/001_stream_re

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Fix another race-condition-ish issue in recovery/t/001_stream_re
Date: 2017-07-06 03:59:25
Message-ID: E1dSxwr-0002At-WB@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers

Fix another race-condition-ish issue in recovery/t/001_stream_rep.pl.

Buildfarm members hornet and sungazer have shown multiple instances of
"Failed test 'xmin of non-cascaded slot with hs feedback has changed'".
The reason seems to be that the test is checking the current xmin of the
master server's replication slot against a past xmin of the first slave
server's replication slot. Even though the latter slot is downstream of
the former, it's possible for its reported xmin to be ahead of the former's
reported xmin, because those numbers are updated whenever the respective
downstream walreceiver feels like it (see logic in WalReceiverMain).
Instrumenting this test shows that indeed the slave slot's xmin does often
advance before the master's does, especially if an autovacuum transaction
manages to occur during the relevant window. If we happen to capture such
an advanced xmin as $xmin, then the subsequent wait_slot_xmins call can
fall through before the master's xmin has advanced at all, and then if it
advances before the get_slot_xmins call, we can get the observed failure.
Yeah, that's a bit of a long chain of deduction, but it's hard to explain
any other way how the test can get past an "xmin <> '$xmin'" check only
to have the next query find that xmin does equal $xmin.

Fix by keeping separate images of the master and slave slots' xmins
and testing their has-xmin-advanced conditions independently.

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/ec86af917551f52246848dd148885df034273f3d

Modified Files
--------------
src/test/recovery/t/001_stream_rep.pl | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

Browse pgsql-committers by date

  From Date Subject
Next Message Dean Rasheed 2017-07-06 09:03:15 pgsql: Simplify the logic checking new range partition bounds.
Previous Message Michael Paquier 2017-07-05 23:29:52 Re: Re: pg_ctl wait exit code (was Re: [COMMITTERS] pgsql: Additional tests for subtransactions in recovery)