From: | "Ryo Yamaji (Fujitsu)" <yamaji(dot)ryo(at)fujitsu(dot)com> |
---|---|
To: | "pgsql-general(at)lists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org> |
Subject: | About row locking ordering |
Date: | 2022-11-24 10:06:54 |
Message-ID: | TYAPR01MB6073506ECCD7B8F51DA807F68A0F9@TYAPR01MB6073.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Hi All.
When multiple sessions tried to acquire a row lock on the same tuple sequentially, I expected
that the first session waiting for the lock would acquire the lock first (FIFO). However, when we
actually tested it, there were cases where a session that was behind a first session acquired
a row lock ahead of first session.
Is this behavior is expected specifications or is it a bug?
The following is a list of the cases where the next session gets the lock first.
・Multiple sessions request row locks for the same tuple
・Update occurs for target tuple
The above behavior may have the following problems:
・If the lock acquiring order is FIFO in Postgres, it is not satisfied.
・Any sessions which cannot acquire a lock for a long time result in a long transaction
If the above behavior is a specification, I think it is necessary to document design considerations
when dealing with row locks.
[Condition]
・Version
- PostgreSQL 14.0 (compiled with LOCK_DEBUG)
・postgresql.conf
- log_lock_waits = on
- deadlock_timeout = 10
- log_line_prefix = '%m [%p:%l:%x] '
- trace_locks = on
- trace_lock_oidmin = 16000
[Table Definition]
create table t1 (col1 int, col2 int, col3 char (10));
insert into t1 values (1,0, ''); insert into t1 values (100, 9999, '');
[Application]
<a.sql> ---------------------------
BEGIN;
SELECT * FROM t1 WHERE col1 = 1 FOR UPDATE;
SELECT current_timestamp AS TRANSTART, clock_timestamp() AS NOW;
UPDATE t1 set col2 = :val, col3 = :valb WHERE col1 = 1;
SELECT pg_sleep(1);
COMMIT;
------------------------------------
[Test]
The test set is attached.
1. Run the following in 100 multiplex
for i in `seq 100`; do psql postgres -f ./a.sql -v val=${i} -v valb="'a.sh'" &; done
2.After 10 seconds (1. After completion of execution), run the following in 10 multiplexes
for i in `seq 10`; do ii=`expr ${i} + 900`; psql postgres -f ./a.sql -v val=${ii} -v valb="'b.sh'" &; done
All a.sh were expected to complete processing first (FIFO). However, b.sh completed the process
before part of a.sh.
[Log]
We are checking two types of logs.
1. ShareLock has one wait, the rest is in AccessExclusiveLock
1-1. Only 1369555 is aligned with ShareLock, the transaction ID obtained by 1369547, and the rest with
AccessExclusiveLock, the tuple obtained by 1369555.
This is similar to a pattern in which no updates have occurred to the tuple.
--------------------------------------------------------------
2022-10-26 01:20:08.881 EDT [1369555:19:0] LOG: process 1369555 still waiting for ShareLock on transaction 2501 after 10.072 ms
2022-10-26 01:20:08.881 EDT [1369555:20:0] DETAIL: Process holding the lock: 1369547. Wait queue: 1369555.
〜
2022-10-26 01:21:58.918 EDT [1369898:17:0] LOG: process 1369898 acquired AccessExclusiveLock on tuple (1, 0) of relation 16546 of database 13779 after 10.321 ms
2022-10-26 01:21:58.918 EDT [1369898:18:0] DETAIL: Process holding the lock: 1369555. Wait queue: 1369558, 1369561, 1369564, 1369567, 1369570, 1369573, 1369576, ...
--------------------------------------------------------------
2. All processes wait with ShareLock
2-1. With 1369558 holding the t1 (0, 4) lock, the queue head is 1369561.
--------------------------------------------------------------
2022-10-26 01:22:27.230 EDT [1369623:46:2525] LOG: process 1369623 still waiting for ShareLock on transaction 2504 after 10.133 msprocess 1369623 still waiting for ShareLock on transaction 2504 after 10.133 ms
2022-10-26 01:22:27.242 EDT [1369877:47:2604] DETAIL: Process holding the lock: 1369558. Wait queue: 1369561, 1369623, 1369626, ...
--------------------------------------------------------------
2-2. When 1369558 locks are released, the first 1369561 in the Wait queue was expected to acquire the lock,
but the process actually acquired 1369787
--------------------------------------------------------------
2022-10-26 01:22:28.237 EDT [1369623:63:2525] LOG: process 1369623 still waiting for ShareLock on transaction 2577 after 10.028 ms
2022-10-26 01:22:28.237 EDT [1369623:64:2525] DETAIL: Process holding the lock: 1369787. Wait queue: 1369623, 1369610, 1369614, 1369617, 1369620.
--------------------------------------------------------------
2-3. Checking that the 1369561 is rearranging.
--------------------------------------------------------------
2022-10-26 01:22:28.237 EDT [1369629:64:2527] DETAIL: Process holding the lock: 1369623. Wait queue: 1369629, 1369821, 1369644, ... 1369561, ...
--------------------------------------------------------------
Regerds, Ryo
Attachment | Content-Type | Size |
---|---|---|
testset.zip | application/x-zip-compressed | 1.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Rama Krishnan | 2022-11-24 12:01:13 | How to select unique records in PostgreSQL |
Previous Message | zxwsbg | 2022-11-24 08:03:37 | Re:Why not set builder->next_phase_at=InvalidTransactionId in SnapBuildRestore function? |