From: | "Mike Klaas" <mike(at)superhuman(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Cc: | "Engineering-archive" <engineering-archive(at)superhuman(dot)com> |
Subject: | Help understanding SIReadLock growing without bound on completed transaction |
Date: | 2020-05-21 19:48:22 |
Message-ID: | kah64cuu.911fba7b-28c9-4c00-93cc-7557777f9e58@we.are.superhuman.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
We recently experienced an issue where a transaction that was finished and no longer existed kept slowly accumulating SIReadLocks over the period of a week. The only way we could remove the locks was by restarting postgresql.
The entries in pg_locks resembled:
mode: SIReadLock
locktype: page
relation::regclass::text: <table>_pkey
virtualtransaction: 36/296299968
granted:t
pid: 2263461
count(1): 5559 (when grouped)
Note that this pid did not exist in pg_stat_activity. I understand that it is normal for SSI locks to persist after a transaction is finished. There are however two aspects to this that I don't understand:
* It's my understanding that these locks should be cleared when there are no conflicting transactions. These locks had existed for > 1 week and we have no transactions that last more than a few seconds (the oldest transaction in pg_stat_activity is always < 1minute old).
* Why would a transaction that is finished continue accumulating locks over time?
If it helps, here is some more information about the state of the system:
* There were a total of six pids in pg_locks that didn't exist in pg_stat_activity. They held a variety of SIReadLocks, but they weren't increasing in number over time. I'm not sure how long they were present; I only know that the problematic pid existed for a week due to its continual growth reflecting in our internal lock monitoring system.
* I tried finding overlapping SIReadLocks (see query below), but none were returned (I realize that the SSI conflict resolution algo is much more involved than this simple query)
* PG version: 9.6.17
I would appreciate any hints of what I could've done to investigate this further or how I could've resolved the issue without restarting the db (and thus experiencing downtime).
thank you in advance,
-Mike
SELECT
waiting.locktype AS w_locktype,
LEFT(waiting.relation::regclass::text,25) AS waiting_table,
COALESCE(waiting_stm.query,'?') AS w_query,
waiting.page ( http://waiting.page/ ) AS w_page,
waiting.tuple AS w_tuple,
waiting.pid ( http://waiting.pid/ ) AS w_pid,
other.locktype AS o_locktype,
LEFT(other.relation::regclass::text,15) AS other_table,
LEFT(COALESCE(other_stm.query, '?'), 50) AS other_query,
other.page ( http://other.page/ ) AS o_page,
other.tuple AS o_tuple,
other.pid ( http://other.pid/ ) AS other_pid,
other.GRANTED AS o_granted
FROM
pg_catalog.pg_locks AS waiting
LEFT JOIN
pg_catalog.pg_stat_activity AS waiting_stm
ON waiting_stm.pid ( http://waiting_stm.pid/ ) = waiting.pid ( http://waiting.pid/ )
JOIN
pg_catalog.pg_locks AS other
ON (
(
waiting."database" = other."database"
AND waiting.relation = other.relation
and waiting.locktype = other.locktype
AND ( CASE WHEN other.locktype = 'page' THEN waiting.page ( http://waiting.page/ ) IS NOT DISTINCT FROM other.page ( http://other.page/ )
WHEN other.locktype = 'tuple' THEN waiting.page ( http://waiting.page/ ) IS NOT DISTINCT FROM other.page ( http://other.page/ ) and waiting.tuple IS NOT DISTINCT FROM other.tuple
ELSE true END
)
)
OR waiting.transactionid = other.transactionid
) AND waiting.pid ( http://waiting.pid/ ) <> other.pid ( http://other.pid/ )
LEFT JOIN
pg_catalog.pg_stat_activity AS other_stm
ON other_stm.pid ( http://other_stm.pid/ ) = other.pid ( http://other.pid/ )
WHERE waiting.pid ( http://waiting.pid/ ) IN (2263461, 2263276, 2263283, 2263284, 2263459, 2263527 )
From | Date | Subject | |
---|---|---|---|
Next Message | Sándor Daku | 2020-05-21 20:12:46 | Re: Table partitioning for cloud service? |
Previous Message | Karsten Hilbert | 2020-05-21 19:33:12 | Re: Strategy for materialisation and centralisation of data |