Re: Timing out A Blocker Based on Time or Count of Waiters

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Fred Habash <fmhabash(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Timing out A Blocker Based on Time or Count of Waiters
Date: 2024-03-26 01:52:48
Message-ID: CAHyXU0x8qWKkfOrMep6rA3H1DnnxS4Hf9NnXfqCb7S1aZrt9Rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Mar 22, 2024 at 11:25 AM Fred Habash <fmhabash(at)gmail(dot)com> wrote:

> Facing an issue where sometimes humans login to a database and run DDL
> statements causing a long locking tree of over 1000 waiters. As a
> workaround, we asked developers to always start their DDL sessions
> with 'SET lock_timeout = 'Xs'.
>
> I reviewed the native lock timeout parameter in Postgres and found 7. None
> seem to be related to blocker timeouts directly.
>
> idle_in_transaction_session_timeout
> idle_session_timeout
> lock_timeout: How long a session waits for a lock
> statement_timeout
> authentication_timeout
> deadlock_timeout
> log_lock_waits
>
> Instead, I put together a quick procedure that counts waiter sessions for
> a given blocker and terminates it if waiter count exceeds a threshold.
>
> Is there not a native way to ...
> 1. Automatically time out a blocker
> 2. A metric that shows how many waiters for a blocker?
>

I guess this probably does not belong in the native codebase because in
most real world scenarios with contention you would end up with priority
inversion or a situation where no work gets done. With current locking
rules, theoretically the work queue would always clear (assuming the locker
doesn't hold the transaction indefinitely), where with your setting enabled
it might not always assume the locker retries.

In your case, a hand written 'unblocker' script might be the way to go, or
(probably better) encourage patterns where critical tables are not blocked,
say by building up a scratch table and swapping in on a separate
transaction. Reducing contention rather than mitigating the symptoms of
it, is *always* a good thing.

merlin

In response to

Browse pgsql-general by date

  From Date Subject
Next Message 赵宇鹏 (宇彭) 2024-03-26 06:23:38 回复:Orphan files filling root partition after crash
Previous Message Justin Clift 2024-03-25 23:43:00 Re: PostgreSQL as advanced job queuing system