From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Idea for fixing parallel pg_dump's lock acquisition problem |
Date: | 2019-04-17 15:34:20 |
Message-ID: | 32178.1555515260@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
We just had another complaint (bug #15767) about parallel dump's
inability to cope with concurrent lock requests. The problem is
well described by the comments for lockTableForWorker():
* Acquire lock on a table to be dumped by a worker process.
*
* The master process is already holding an ACCESS SHARE lock. Ordinarily
* it's no problem for a worker to get one too, but if anything else besides
* pg_dump is running, there's a possible deadlock:
*
* 1) Master dumps the schema and locks all tables in ACCESS SHARE mode.
* 2) Another process requests an ACCESS EXCLUSIVE lock (which is not granted
* because the master holds a conflicting ACCESS SHARE lock).
* 3) A worker process also requests an ACCESS SHARE lock to read the table.
* The worker is enqueued behind the ACCESS EXCLUSIVE lock request.
* 4) Now we have a deadlock, since the master is effectively waiting for
* the worker. The server cannot detect that, however.
*
* To prevent an infinite wait, prior to touching a table in a worker, request
* a lock in ACCESS SHARE mode but with NOWAIT. If we don't get the lock,
* then we know that somebody else has requested an ACCESS EXCLUSIVE lock and
* so we have a deadlock. We must fail the backup in that case.
Failing the whole backup is, of course, not a nice outcome.
While thinking about that, it occurred to me that we could close the
gap if the server somehow understood that the master was waiting for
the worker. And it's actually not that hard to make that happen:
we could use advisory locks. Consider a dance like the following:
1. Master has A.S. lock on table t, and dispatches a dump job for t
to worker.
2. Worker chooses some key k, does pg_advisory_lock(k), and sends k
back to master.
3. Worker attempts to get A.S. lock on t. This might block, if some
other session has a pending lock request on t.
4. Upon receipt of message from worker, master also does
pg_advisory_lock(k). This blocks, but now the server can see that a
deadlock exists, and after deadlock_timeout elapses it will fix the
deadlock by letting the worker bypass the other pending lock request.
5. Once worker has the lock on table t, it does pg_advisory_unlock(k)
to release the master.
6. Master also does pg_advisory_unlock(k), and goes on about its business.
I've tested that the server side of this works, for either order of steps
3 and 4. It seems like mostly just a small matter of programming to teach
pg_dump to do this, although there are some questions to resolve, mainly
how we choose the advisory lock keys. If there are user applications
running that also use advisory locks, there could be unwanted
interference. One easy improvement is to use pg_try_advisory_lock(k) in
step 2, and just choose a different k if the lock's in use. Perhaps,
since we don't expect that the locks would be held long, that's
sufficient --- but I suspect that users might wish for some pg_dump
options to restrict the set of keys it could use.
Another point is that the whole dance is unnecessary in the normal
case, so maybe we should only do this if an initial attempt to get
the lock on table t fails. However, LOCK TABLE NOWAIT throws an
error if it can't get the lock, so this would require using a
subtransaction or starting a whole new transaction in the worker,
so maybe that's more trouble than it's worth. Some performance
testing might be called for. There's also the point that the code
path would go almost entirely untested if it's not exercised always.
Lastly, we'd probably want both the master and worker processes to
run with small values of deadlock_timeout, so as to reduce the
time wasted before the server breaks the deadlock. Are there any
downsides to that? (If so, again maybe it's not worth the trouble,
since the typical case is that no wait is needed.)
Thoughts? I'm not volunteering to write this right now, but maybe
somebody else will take up the project.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2019-04-17 15:57:35 | Re: block-level incremental backup |
Previous Message | Tom Lane | 2019-04-17 14:24:17 | Re: [patch] pg_test_timing does not prompt illegal option |