From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>, Victor Yegorov <vyegorov(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Deadlock between backend and recovery may not be detected |
Date: | 2020-12-16 17:15:00 |
Message-ID: | 0bba5bf2-cd08-1038-09bd-b95350e89f0d@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2020/12/16 23:28, Drouvot, Bertrand wrote:
> Hi,
>
> On 12/16/20 2:36 PM, Victor Yegorov wrote:
>>
>> *CAUTION*: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>> ср, 16 дек. 2020 г. в 13:49, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com <mailto:masao(dot)fujii(at)oss(dot)nttdata(dot)com>>:
>>
>> After doing this procedure, you can see the startup process and backend
>> wait for the table lock each other, i.e., deadlock. But this deadlock remains
>> even after deadlock_timeout passes.
>>
>> This seems a bug to me.
>>
> +1
>
>>
>> > * Deadlocks involving the Startup process and an ordinary backend process
>> > * will be detected by the deadlock detector within the ordinary backend.
>>
>> The cause of this issue seems that ResolveRecoveryConflictWithLock() that
>> the startup process calls when recovery conflict on lock happens doesn't
>> take care of deadlock case at all. You can see this fact by reading the above
>> source code comment for ResolveRecoveryConflictWithLock().
>>
>> To fix this issue, I think that we should enable STANDBY_DEADLOCK_TIMEOUT
>> timer in ResolveRecoveryConflictWithLock() so that the startup process can
>> send PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal to the backend.
>> Then if PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK signal arrives,
>> the backend should check whether the deadlock actually happens or not.
>> Attached is the POC patch implimenting this.
>>
> good catch!
>
> I don't see any obvious reasons why the STANDBY_DEADLOCK_TIMEOUT shouldn't be set in ResolveRecoveryConflictWithLock() too (it is already set in ResolveRecoveryConflictWithBufferPin()).
>
> So + 1 to consider this as a bug and for the way the patch proposes to fix it.
Thanks Victor and Bertrand for agreeing!
Attached is the updated version of the patch.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachment | Content-Type | Size |
---|---|---|
recovery_conflict_lock_deadlock_v2.patch | text/plain | 8.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2020-12-16 17:22:23 | Re: pg_upgrade test for binary compatibility of core data types |
Previous Message | Peter Eisentraut | 2020-12-16 17:07:08 | Re: SELECT INTO deprecation |