From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Rintaro(dot)Ikeda(at)nttdata(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
Subject: | Re: Undetected deadlock between client backend and startup processes on a standby (Previously, Undetected deadlock between primary and standby processes) |
Date: | 2024-03-10 20:43:11 |
Message-ID: | ea96bc84-e242-4179-a440-9d4b8a7bae9f@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On 3/4/24 09:35, Rintaro(dot)Ikeda(at)nttdata(dot)com wrote:
> Hi,
>
> I correct the previous bug report [1] to provide a more accurate
> description. The bug report demonstrated undetected deadlock between
> client backend and startup processes on a standby server. (The title
> in the previous bug report is "Undetected deadlock between primary
> and standby processes". But this was wrong. Actually, this should be
> noted that "Undetected deadlock between client backend and startup
> process on a standby server".)
>
> After the procedures proposed in my bug report [1], a recovery
> conflict is present because the tablespace which startup process
> tries to drop is used by cliend backend process in standby. We see
> the pg_stat_activity (shown below), which implies a deadlock. A
> client backend process waits for AccessExclusiveLock to be released.
> Startup process waits for recovery conflict resolution for dropping
> the tablespace. This deadlock is not resolved after deadlock_timeout
> passes.
>
> (Standby server)
> postgres=# select datid, datname, wait_event_type, wait_event, query, backend_type from pg_stat_activity ;
> datid | datname | wait_event_type | wait_event | query | backend_type
> -------+----------+-----------------+----------------------------+-------------------------------------------------------------------------------------------------+-------------------
> 5 | postgres | Lock | relation | SELECT * FROM t; | client backend
> | | IPC | RecoveryConflictTablespace | | startup
>
>
> This deadlock is similar to the previously identified and patched
> issue [2], which also involved an undetected deadlock between
> backend process and recovery on a standby server. I think the
> deadlock explained in this report should be detected and resolved.
>
Thanks for the report.
So what are the steps to reproduce this? The previous message did all
kinds of stuff on the primary and then got stuck on pg_switch_wal() on
the primary, but this updated seems to do stuff on the standby and gets
the lockup there.
It seems similar in the sense that it's about interaction between
recovery and a regular backend, but unfortunately
ResolveRecoveryConflictWithVirtualXIDs does not wait for a lock, it just
checks if the XID is still running, so it's invisible to the deadlock
detector :-(
But it's still checked against max_standby_streaming_delay, which should
resolve the deadlock (unless set to -1 to allow infinite delays) at some
point, right?
Also, I'm not very familiar with ResolveRecoveryConflictWithVirtualXIDs,
but it seems it's doing a busy wait. I wonder if that's a good idea, but
it's independent of this bug report.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2024-03-11 05:25:43 | Re: "type with xxxx does not exist" when doing ExecMemoize() |
Previous Message | PG Bug reporting form | 2024-03-10 19:00:00 | BUG #18385: Assert("strategy_delta >= 0") in BgBufferSync() fails due to race condition |