Re: BUG #18404: Select from pg_stat_activity in a plpgsql block can lead to a global locking issue

From: Noah Misch <noah(at)leadboat(dot)com>
To: exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18404: Select from pg_stat_activity in a plpgsql block can lead to a global locking issue
Date: 2024-04-07 01:01:46
Message-ID: 20240407010146.9b@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Mar 22, 2024 at 01:00:01PM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 18404
> Logged by: Alexander Lakhin
> Email address: exclusion(at)gmail(dot)com
> PostgreSQL version: 16.2
> Operating system: Ubuntu 22.04
> Description:
>
> The following script:
> createuser u
>
> cat << 'EOF' | psql -U u &
> DO '
> BEGIN
> PERFORM count(*) FROM pg_stat_activity;
> RAISE NOTICE ''sleeping...'';
> PERFORM pg_sleep(1800);
> END';
> EOF
> sleep 0.5
>
> cat << EOF | psql &
> VACUUM;
> REINDEX SYSTEM;
> EOF
> sleep 0.5
>
> cat << EOF | psql -U u
> SELECT 1
> EOF
>
> ends with the last session stuck on startup:
> law 3318525 3318511 0 15:00 ? 00:00:00 postgres: u regression
> [local] DO
> law 3318530 3318511 0 15:00 ? 00:00:00 postgres: law regression
> [local] REINDEX waiting
> law 3318533 3318511 0 15:00 ? 00:00:00 postgres: u regression
> [local] startup waiting
> law 3318628 3318511 0 15:01 ? 00:00:00 postgres: autovacuum
> worker waiting
> law 3318654 3318511 0 15:01 ? 00:00:00 postgres: autovacuum
> worker waiting
> law 3318676 3318511 0 15:02 ? 00:00:00 postgres: autovacuum
> worker waiting
>
> The backtrace of the last session backend is:
[waiting for AccessShareLock on pg_authid_rolname_index]

> The backtrace of the REINDEXing backend is:
[waiting for AccessExclusiveLock on pg_authid_rolname_index]

As a partial workaround, you can set lock_timeout before REINDEX SYSTEM.

Each backend is doing the right thing in isolation, under current objectives.
The first backend holds AccessShareLock on pg_authid_rolname_index, because
pg_stat_activity is a view that joins to pg_authid. We follow
https://en.wikipedia.org/wiki/Two-phase_locking#Strong_strict_two-phase_locking
and hold relation locks till end of xact. That could be the easiest part to
change, but it's still hard. REINDEX acquires AccessExclusiveLock to make
trivial the code for making readers transition to the replacement index.
Backend startup needs to read catalogs; that is least likely to change. I
don't see a general way to keep the rest of the system productive when
AccessExclusiveLock of a system relation is happening. I can imagine mostly
one-off changes, like making pg_stat_activity release locks earlier.

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Mao Jiayin 2024-04-07 03:32:41 Re: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit
Previous Message Jeff Janes 2024-04-06 23:25:08 Re: BUG #18423: suboptimal query plan is used when ordering by an indexed field with limit