Re: why can the isolation tester handle only one waiting process?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: why can the isolation tester handle only one waiting process?
Date: 2015-09-08 21:11:36
Message-ID: CA+TgmoaeRPfXMRgZJO-pxa+-sggE-ofUCTxpNGcOz9ckE5KfGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 17, 2015 at 5:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Good idea. Here's an updated patch series that takes that approach.
> It cancels any query after 60 seconds of waiting, and if the query
> doesn't respond to the cancel, then it bails out completely after 75
> seconds (i.e. 15 seconds after attempting the cancel).

Here's an updated patch series with some more improvements to the
isolationtester code, and some better test cases. I now have a test
for (a) a "simple" deadlock, involving a lock upgrade scenario where
the process seeking the upgrade must jump the wait queue; (b) a hard
deadlock; and (c) a soft deadlock that can be resolved by reordering
the wait queue. According to lcov this tests most of deadlock.c: 10
of 11 functions (not GetBlockingAutoVacuumPgproc), and 246 of 291
lines. That's clearly an improvement over the status quo, but I'm
having a hard time feeling happy about it, because it's really only
testing the easy cases.

I can't construct a case where reversing any one single soft edge
doesn't immediately resolve the deadlock (see end of
TestConfigurationRecurse); the first one tried always works. Moving a
process that would otherwise deadlock ahead of conflicting waiters
seems to be an extremely effective way of resolving deadlocks. For it
to fail, reversing one of the edges in the waits-for graph must create
a new cycle. But it seems to be quite hard for that to actually
happen: the new edge that is created after the reversal points to the
guy that got skipped ahead in the queue. For that reversed edge to be
part of a cycle, the queue-skipping process has to be directly or
indirectly waiting for some other process he jumped over. But,
clearly, he's only waiting for processes that are *still ahead* of him
in the queue, and he would have had to wait for those processes
whether he'd skipped ahead in the queue or not. So perhaps a test
case here would involve a process that skips forward in the lock
queue, but not far enough? But I haven't been able to figure it out.

I also can't construct a test case where ExpandConstraints returns
false (see TestConfiguration); the wait orderings it generates are
always self-consistent.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
0001-Allow-multiple-sessions-to-wait.patch text/x-diff 14.5 KB
0002-Specify-permutations-for-isolation-tests-with-invali.patch text/x-diff 144.7 KB
0003-Add-some-isolation-tests-for-deadlock-detection-and-.patch text/x-diff 8.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2015-09-08 21:13:43 Re: Getting total and free disk space from paths in PGDATA
Previous Message Stephen Frost 2015-09-08 21:05:04 Re: missing locking in at least INSERT INTO view WITH CHECK