From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Re: [COMMITTERS] pgsql: Add some isolation tests for deadlock detection and resolution. |
Date: | 2016-02-15 21:29:26 |
Message-ID: | 1056.1455571766@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
Andres Freund <andres(at)anarazel(dot)de> writes:
> I wonder if we shouldn't just expose a 'which pid is process X waiting
> for' API, implemented serverside. That's generally really useful, and
> looks like it's actually going to be less complicated than that
> query... And it's surely going to be faster.
Attached is a draft patch for a new function that reports the set of PIDs
directly blocking a given PID (that is, holding or awaiting conflicting
locks on the lockable object it's waiting for).
I replaced isolationtester's pg_locks query with this, and found that
it's about 9X faster in a normal build, and 3X faster with
CLOBBER_CACHE_ALWAYS turned on. That would give us some nice headroom
for the isolation tests with CLOBBER_CACHE_ALWAYS animals. (Note that
in view of
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2016-02-14%2007%3A38%3A37
we still need to do *something* about the speed of the new deadlock-hard
test; this patch could avoid the need to dumb down or slow down that test
even further.)
Not to be neglected also is that (I believe) this gives the right answer,
whereas isolationtester's existing query is currently completely broken by
parallel queries, and it doesn't understand non-conflicting lock modes
either. (It did, at least partially, before commit 38f8bdcac4982215;
I am not sure that taking out the mode checks was a good idea. But
putting them back would make the query slower yet.)
This is WIP, in part because I've written no user docs for the new
function, and in part because I think people might want to bikeshed the
API. What is here is:
"pg_blocker_pids(integer) returns integer[]" returns a possibly-empty
array of the PIDs of backend processes that block the backend with
specified PID. You get an empty array, not an error, if the argument
isn't a valid backend PID or that backend isn't waiting. In parallel
query situations, the output includes PIDs that are blocking any PID
in the given process's lock group, and what is reported is always
the PID of the lock group leader for whichever process is kdoing the
blocking. Also, in parallel query situations, the same PID might appear
multiple times in the output; it didn't seem worth trying to eliminate
duplicates.
Reasonable API change requests might include returning a rowset rather
than an array and returning more data per lock conflict. I've not
bothered with either thing here because isolationtester doesn't care
and it would make the query somewhat slower for isolationtester's
usage (though, probably, not enough slower to really matter).
It should also be noted that I've not really tested the parallel
query aspects of this, because I'm not sure how to create a conflicting
lock request in a parallel worker. However, if I'm not still
misunderstanding the new semantics of the lock data structures, that
aspect of it seems unlikely to be wrong.
Comments?
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
pg_blocker_pids.patch | text/x-diff | 26.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2016-02-15 22:01:12 | pgsql: Allow SetHintBits() to succeed if the buffer's LSN is new enough |
Previous Message | Joe Conway | 2016-02-15 21:22:28 | pgsql: Correct Copyright year from 2015 to 2016 |
From | Date | Subject | |
---|---|---|---|
Next Message | Kevin Grittner | 2016-02-15 22:03:46 | Re: WIP: Detecting SSI conflicts before reporting constraint violations |
Previous Message | Bruce Momjian | 2016-02-15 21:13:13 | Re: Freeze avoidance of very large table. |