Re: Let's invent a function to report lock-wait-blocking PIDs

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Let's invent a function to report lock-wait-blocking PIDs
Date: 2013-03-20 18:29:06
Message-ID: CA+TgmoYBMR-ySG1h-sg_deGVqb2k--KwNy0P+CuOwWASxVh7pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 20, 2013 at 2:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I was just looking into why the -DCLOBBER_CACHE_ALWAYS buildfarm
> critters aren't managing to run the new "timeouts" isolation test
> successfully, despite very generous timeouts. The answer is that
> 2 seconds isn't quite enough time to parse+plan+execute the query
> that isolationtester uses to see if the current test session is
> blocked on a lock, if CLOBBER_CACHE_ALWAYS is on. Now, that query
> is totally horrible:
>
> appendPQExpBufferStr(&wait_query,
> "SELECT 1 FROM pg_locks holder, pg_locks waiter "
> "WHERE NOT waiter.granted AND waiter.pid = $1 "
> "AND holder.granted "
> "AND holder.pid <> $1 AND holder.pid IN (");
> /* The spec syntax requires at least one session; assume that here. */
> appendPQExpBuffer(&wait_query, "%s", backend_pids[1]);
> for (i = 2; i < nconns; i++)
> appendPQExpBuffer(&wait_query, ", %s", backend_pids[i]);
> appendPQExpBufferStr(&wait_query,
> ") "
>
> "AND holder.mode = ANY (CASE waiter.mode "
> "WHEN 'AccessShareLock' THEN ARRAY["
> "'AccessExclusiveLock'] "
> "WHEN 'RowShareLock' THEN ARRAY["
> "'ExclusiveLock',"
> "'AccessExclusiveLock'] "
> "WHEN 'RowExclusiveLock' THEN ARRAY["
> "'ShareLock',"
> "'ShareRowExclusiveLock',"
> "'ExclusiveLock',"
> "'AccessExclusiveLock'] "
> "WHEN 'ShareUpdateExclusiveLock' THEN ARRAY["
> "'ShareUpdateExclusiveLock',"
> "'ShareLock',"
> "'ShareRowExclusiveLock',"
> "'ExclusiveLock',"
> "'AccessExclusiveLock'] "
> "WHEN 'ShareLock' THEN ARRAY["
> "'RowExclusiveLock',"
> "'ShareUpdateExclusiveLock',"
> "'ShareRowExclusiveLock',"
> "'ExclusiveLock',"
> "'AccessExclusiveLock'] "
> "WHEN 'ShareRowExclusiveLock' THEN ARRAY["
> "'RowExclusiveLock',"
> "'ShareUpdateExclusiveLock',"
> "'ShareLock',"
> "'ShareRowExclusiveLock',"
> "'ExclusiveLock',"
> "'AccessExclusiveLock'] "
> "WHEN 'ExclusiveLock' THEN ARRAY["
> "'RowShareLock',"
> "'RowExclusiveLock',"
> "'ShareUpdateExclusiveLock',"
> "'ShareLock',"
> "'ShareRowExclusiveLock',"
> "'ExclusiveLock',"
> "'AccessExclusiveLock'] "
> "WHEN 'AccessExclusiveLock' THEN ARRAY["
> "'AccessShareLock',"
> "'RowShareLock',"
> "'RowExclusiveLock',"
> "'ShareUpdateExclusiveLock',"
> "'ShareLock',"
> "'ShareRowExclusiveLock',"
> "'ExclusiveLock',"
> "'AccessExclusiveLock'] END) "
>
> "AND holder.locktype IS NOT DISTINCT FROM waiter.locktype "
> "AND holder.database IS NOT DISTINCT FROM waiter.database "
> "AND holder.relation IS NOT DISTINCT FROM waiter.relation "
> "AND holder.page IS NOT DISTINCT FROM waiter.page "
> "AND holder.tuple IS NOT DISTINCT FROM waiter.tuple "
> "AND holder.virtualxid IS NOT DISTINCT FROM waiter.virtualxid "
> "AND holder.transactionid IS NOT DISTINCT FROM waiter.transactionid "
> "AND holder.classid IS NOT DISTINCT FROM waiter.classid "
> "AND holder.objid IS NOT DISTINCT FROM waiter.objid "
> "AND holder.objsubid IS NOT DISTINCT FROM waiter.objsubid ");
>
> This is way more knowledge than we (should) want a client to embed about
> which lock types block which others. What's worse, it's still wrong.
> The query will find cases where one of the test sessions *directly*
> blocks another one, but not cases where the blockage is indirect.
> For example, consider that A holds AccessShareLock, B is waiting for
> AccessExclusiveLock on the same object, and C is queued up behind B
> for another AccessShareLock. This query will not think that C is
> blocked, not even if B is part of the set of sessions of interest
> (because B will show the lock as not granted); but especially so if
> B is not part of the set.
>
> I think that such situations may not arise in the specific context that
> isolationtester says it's worried about, which is to disregard waits for
> locks held by autovacuum. But in general, you can't reliably tell who's
> blocking whom with a query like this.
>
> If isolationtester were the only market for this type of information,
> maybe it wouldn't be worth worrying about. But I'm pretty sure that
> there are a *lot* of monitoring applications out there that are trying
> to extract who-blocks-whom information from pg_locks. I hadn't realized
> before quite how painful it is to do that, even incorrectly.
>
> I propose that we should add a backend function that simplifies this
> type of query. The API that comes to mind is (name subject to
> bikeshedding)
>
> pg_blocking_pids(pid int) returns int[]
>
> defined to return NULL if the argument isn't the PID of any backend or
> that backend isn't waiting for a lock, and otherwise an array of the
> PIDs of the backends that are blocking it from getting the lock.
> I would compute the array as
>
> PIDs of backends already holding conflicting locks,
> plus PIDs of backends requesting conflicting locks that are
> ahead of this one in the lock's wait queue,
> plus PIDs of backends that block the latter group of PIDs
> (ie, are holding locks conflicting with their requests,
> or are awaiting such locks and are ahead of them in the queue)
>
> There would be some cases where this definition would be too expansive,
> ie we'd release the waiter after only some of the listed sessions had
> released their lock or request. (That could happen for instance if we
> concluded we had to move up the waiter's request to escape a deadlock.)
> But I think that it's better to err in that direction than to
> underestimate the set of relevant PIDs.
>
> In the isolationtester use-case, we'd get the right answer by testing
> whether this function's result has any overlap with the set of PIDs of
> test sessions, ie
>
> select pg_blocking_pids($1) && array[pid1, pid2, pid3, ...]

Sounds excellent.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2013-03-20 18:34:21 Re: machine-parseable object descriptions
Previous Message Robert Haas 2013-03-20 18:24:50 Re: Materialized view assertion failure in HEAD