PostgreSQL's processes blocking each other are not detected as deadlock

From: Sorin Mircioiu <sorin(dot)mircioiu(at)gmail(dot)com>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: PostgreSQL's processes blocking each other are not detected as deadlock
Date: 2023-09-26 10:23:48
Message-ID: CAO-WFfkqwkqZ+qj0rqcjjg6UqL_34ZPXgC=fqquqjUfkoPt6Sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

OS: Oracle Linux Server 9.1
Kernel: Linux 5.15.0-100.96.32.el9uek.x86_64

PostgreSQL's version: 13.10
Configuration: deadlock_timeout: 1s, log_lock_waits: off

I found in the logs some detected deadlocks:

```
2023-09-22 16:54:19.425 UTC 650dc6ba.5732 497942 22322 ERROR: deadlock
detected
2023-09-22 16:54:19.425 UTC 650dc6ba.5732 497942 22322 DETAIL: Process
22322 waits for ExclusiveLock on tuple (0,48) of relation 415480 of
database 16873; blocked by process 13502.
Process 13502 waits for ShareLock on transaction 497938; blocked by
process 13482.
Process 13482 waits for ShareLock on transaction 497932; blocked by
process 21544.
Process 21544 waits for ExclusiveLock on tuple (0,48) of relation
415480 of database 16873; blocked by process 22322.
```

```
2023-09-22 16:54:20.292 UTC 650dc538.34aa 497938 13482 ERROR: deadlock
detected
2023-09-22 16:54:20.292 UTC 650dc538.34aa 497938 13482 DETAIL: Process
13482 waits for ShareLock on transaction 497932; blocked by process 21544.
Process 21544 waits for ExclusiveLock on tuple (0,48) of relation
415480 of database 16873; blocked by process 13502.
Process 13502 waits for ShareLock on transaction 497938; blocked by
process 13482.
```

Several hours later (after the detected deadlocks) I can see that two
processes from the above logs are mentioned in the pg_locks table.

According to official documentation (<
https://wiki.postgresql.org/wiki/Lock_Monitoring#.D0.A1ombination_of_blocked_and_blocking_activity>)
I can view the blocked and blocking activity using the following queries:

Query (locks)

```
select relation::regclass, * from pg_locks where not granted;
```

Output (locks)

```
-[ RECORD 1 ]------+--------------
relation |
locktype | transactionid
database |
relation |
page |
tuple |
virtualxid |
transactionid | 497932
classid |
objid |
objsubid |
virtualtransaction | 16/1248
pid | 13502
mode | ShareLock
granted | f
fastpath | f

-[ RECORD 2 ]------+--------------
relation |
locktype | transactionid
database |
relation |
page |
tuple |
virtualxid |
transactionid | 497941
classid |
objid |
objsubid |
virtualtransaction | 41/34
pid | 21544
mode | ShareLock
granted | f
fastpath | f
```

Query (blocking pids)

```
SELECT blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS current_statement_in_blocking_process
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid
= blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM
blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM
blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid
= blocking_locks.pid
WHERE NOT blocked_locks.granted;
```

Output (blocking pids):

```
-[ RECORD 1 ]-------------------------+-----------------------------------
blocked_pid | 13502
blocked_user | my_username
blocking_pid | 21544
blocking_user | my_username
blocked_statement | SELECT * FROM my_procedure_1()
current_statement_in_blocking_process | SELECT * FROM my_procedure_2()

-[ RECORD 2 ]-------------------------+-----------------------------------
blocked_pid | 21544
blocked_user | my_username
blocking_pid | 13502
blocking_user | my_username
blocked_statement | SELECT * FROM my_procedure_2()
current_statement_in_blocking_process | SELECT * FROM my_procedure_1()
```

Linux console output:

```
ps -ef | grep postgres | grep waiting

postgres 13502 972 0 Sep22 ? 00:00:01 postgres:
127.0.0.1(43736) SELECT waiting
postgres 21544 972 0 Sep22 ? 00:00:00 postgres:
127.0.0.1(41848) SELECT waiting
```

```
strace -p 13502
strace: Process 13502 attached
epoll_wait(96, ^Cstrace: Process 13502 detached
<detached ...>

strace -p 21544
strace: Process 21544 attached
epoll_wait(77, ^Cstrace: Process 21544 detached
<detached ...>
```

I understand that I have to modify the code in order to avoid the
deadlocks. I will do that.

I need answers to the following questions:

1. 13502 is blocked by 21544 and 21544 is blocked by 13502. Why PostgreSQL
didn't detect this situation as a deadlock ? The fact that I'm using
procedure calls could mislead PostgreSQL ?
2. The lock mode used is ShareLock (as mentioned in the locks output). The
official documentation (<
https://www.postgresql.org/docs/13/explicit-locking.html>) says that
ShareLock can be "Acquired by CREATE INDEX (without CONCURRENTLY)." I'm
not using "CREATE INDEX" or something related to indexes in my queries. The
question is, why is ShareLock used for the locking system in my situation ?

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David Rowley 2023-09-26 13:42:32 Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower
Previous Message Richard Guo 2023-09-26 09:46:11 Re: BUG #18103: bugs of concurrent merge into when use different join plan