Re: Unstable tests for recovery conflict handling

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Unstable tests for recovery conflict handling
Date: 2022-07-26 18:16:11
Message-ID: 20220726181611.4xw3blxigqzsz4d4@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Hi,

On 2022-07-26 13:57:53 -0400, Tom Lane wrote:
> I happened to notice that while skink continues to fail off-and-on
> in 031_recovery_conflict.pl, the symptoms have changed! What
> we're getting now typically looks like [1]:
>
> [10:45:11.475](0.023s) ok 14 - startup deadlock: lock acquisition is waiting
> Waiting for replication conn standby's replay_lsn to pass 0/33FB8B0 on primary
> done
> timed out waiting for match: (?^:User transaction caused buffer deadlock with recovery.) at t/031_recovery_conflict.pl line 367.
>
> where absolutely nothing happens in the standby log, until we time out:
>
> 2022-07-24 10:45:11.452 UTC [1468367][client backend][2/4:0] LOG: statement: SELECT * FROM test_recovery_conflict_table2;
> 2022-07-24 10:45:11.472 UTC [1468547][client backend][3/2:0] LOG: statement: SELECT 'waiting' FROM pg_locks WHERE locktype = 'relation' AND NOT granted;
> 2022-07-24 10:48:15.860 UTC [1468362][walreceiver][:0] FATAL: could not receive data from WAL stream: server closed the connection unexpectedly
>
> So this is not a case of RecoveryConflictInterrupt doing the wrong thing:
> the startup process hasn't detected the buffer conflict in the first
> place.

I wonder if this, at least partially, could be be due to the elog thing
I was complaining about nearby. I.e. we decide to FATAL as part of a
recovery conflict interrupt, and then during that ERROR out as part of
another recovery conflict interrupt (because nothing holds interrupts as
part of FATAL).

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2022-07-26 18:30:30 Re: Unstable tests for recovery conflict handling
Previous Message Tom Lane 2022-07-26 17:57:53 Re: Unstable tests for recovery conflict handling

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-07-26 18:19:06 Re: optimize lookups in snapshot [sub]xip arrays
Previous Message vignesh C 2022-07-26 18:13:53 Re: Handle infinite recursion in logical replication setup