Help understanding SIReadLock growing without bound on completed transaction

From: "Mike Klaas" <mike(at)superhuman(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: "Engineering-archive" <engineering-archive(at)superhuman(dot)com>
Subject: Help understanding SIReadLock growing without bound on completed transaction
Date: 2020-05-21 19:48:22
Message-ID: kah64cuu.911fba7b-28c9-4c00-93cc-7557777f9e58@we.are.superhuman.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

We recently experienced an issue where a transaction that was finished and no longer existed kept slowly accumulating SIReadLocks over the period of a week.  The only way we could remove the locks was by restarting postgresql.

The entries in pg_locks resembled:

mode: SIReadLock

locktype: page

relation::regclass::text: <table>_pkey

virtualtransaction: 36/296299968

granted:t

pid: 2263461

count(1): 5559 (when grouped)

Note that this pid did not exist in pg_stat_activity.  I understand that it is normal for SSI locks to persist after a transaction is finished.  There are however two aspects to this that I don't understand:

* It's my understanding that these locks should be cleared when there are no conflicting transactions.  These locks had existed for > 1 week and we have no transactions that last more than a few seconds (the oldest transaction in pg_stat_activity is always < 1minute old).

* Why would a transaction that is finished continue accumulating locks over time?

If it helps, here is some more information about the state of the system:

* There were a total of six pids in pg_locks that didn't exist in pg_stat_activity.  They held a variety of SIReadLocks, but they weren't increasing in number over time.  I'm not sure how long they were present; I only know that the problematic pid existed for a week due to its continual growth reflecting in our internal lock monitoring system.

* I tried finding overlapping SIReadLocks (see query below), but none were returned (I realize that the SSI conflict resolution algo is much more involved than this simple query)

* PG version: 9.6.17

I would appreciate any hints of what I could've done to investigate this further or how I could've resolved the issue without restarting the db (and thus experiencing downtime).

thank you in advance,
-Mike

SELECT

waiting.locktype AS w_locktype,

LEFT(waiting.relation::regclass::text,25) AS waiting_table,

COALESCE(waiting_stm.query,'?') AS w_query,

waiting.page ( http://waiting.page/ ) AS w_page,

waiting.tuple AS w_tuple,

waiting.pid ( http://waiting.pid/ ) AS w_pid,

other.locktype AS o_locktype,

LEFT(other.relation::regclass::text,15) AS other_table,

LEFT(COALESCE(other_stm.query, '?'), 50) AS other_query,

other.page ( http://other.page/ ) AS o_page,

other.tuple AS o_tuple,

other.pid ( http://other.pid/ ) AS other_pid,

other.GRANTED AS o_granted

FROM

pg_catalog.pg_locks AS waiting

LEFT JOIN

pg_catalog.pg_stat_activity AS waiting_stm

ON waiting_stm.pid ( http://waiting_stm.pid/ ) = waiting.pid ( http://waiting.pid/ )

JOIN

pg_catalog.pg_locks AS other

ON (

(

waiting."database" = other."database"

AND waiting.relation = other.relation

and waiting.locktype = other.locktype

AND ( CASE WHEN other.locktype = 'page' THEN waiting.page ( http://waiting.page/ ) IS NOT DISTINCT FROM other.page ( http://other.page/ )

WHEN other.locktype = 'tuple' THEN waiting.page ( http://waiting.page/ ) IS NOT DISTINCT FROM other.page ( http://other.page/ ) and waiting.tuple IS NOT DISTINCT FROM other.tuple

ELSE true END

)

)

OR waiting.transactionid = other.transactionid

) AND waiting.pid ( http://waiting.pid/ ) <> other.pid ( http://other.pid/ )

LEFT JOIN

pg_catalog.pg_stat_activity AS other_stm

ON other_stm.pid ( http://other_stm.pid/ ) = other.pid ( http://other.pid/ )

WHERE waiting.pid ( http://waiting.pid/ ) IN (2263461, 2263276, 2263283, 2263284, 2263459, 2263527 )

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sándor Daku 2020-05-21 20:12:46 Re: Table partitioning for cloud service?
Previous Message Karsten Hilbert 2020-05-21 19:33:12 Re: Strategy for materialisation and centralisation of data