From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
Cc: | Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: win32 _dosmaperr() |
Date: | 2005-08-13 19:34:28 |
Message-ID: | 20428.1123961668@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Qingqing Zhou wrote:
>> Things could get worse because the whole database cluster may stop working
>> and waiting for the buffer the bgwriter is working on, but bgwriter is
>> waiting for (by the deadloop in pgunlink) those postgres'es to move on (so
>> that they could close the problematic xlog segment), which is a deadlock.
I think that analysis is bogus. The bgwriter only tries to unlink xlog
segments during post-checkpoint cleanup, at which point it isn't holding
any buffer locks. Likewise, while backends might wait trying to remove
a table file because the bgwriter has the file open, in that state they
aren't blocking the bgwriter either.
In the latter case, the backends will have to wait till the bgwriter
closes the file, which it'll do not later than the next checkpoint.
I wonder whether the complaints are coming from people who don't know
about that, and didn't wait long enough?
There could be a deadlock if a backend is holding open an old xlog
segment while it executes a CHECKPOINT command, because then it'll
wait for the bgwriter, and the bgwriter might think it could remove
the xlog file during the checkpoint.
Another form could only happen between two backends: A is trying to
unlink file F, which backend B has open, and then for some unrelated
reason B has to wait for a lock held by A. The bgwriter doesn't take
nor wait for locks so this doesn't apply to it.
But none of this should be happening because we're supposedly always
opening all these files with the magic sharing flag.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Brendan Jurd | 2005-08-13 19:51:16 | Re: gettime() - a timeofday() alternative |
Previous Message | Andrew Dunstan | 2005-08-13 18:24:02 | distributed performance testing |