From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: SimpleLruTruncate() mutual exclusion |
Date: | 2019-08-01 06:51:17 |
Message-ID: | 20190801065117.GA2355684@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jul 29, 2019 at 12:58:17PM -0400, Tom Lane wrote:
> > On Sun, Feb 17, 2019 at 11:31:03PM -0800, Noah Misch wrote:
> >>> b. Arrange so only one backend runs vac_truncate_clog() at a time. Other than
> >>> AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
> >>> checkpoint, or in the startup process. Hence, also arrange for only one
> >>> backend to call SimpleLruTruncate(AsyncCtl) at a time.
>
> >> More specifically, restrict vac_update_datfrozenxid() to one backend per
> >> database, and restrict vac_truncate_clog() and asyncQueueAdvanceTail() to one
> >> backend per cluster. This, attached, was rather straightforward.
> I'm pretty sure it was intentional that multiple backends
> could run truncation directory scans concurrently, and I don't really
> want to give that up if possible.
I want to avoid a design that requires painstaking concurrency analysis. Such
analysis is worth it for heap_update(), but vac_truncate_clog() isn't enough
of a hot path. If there's a way to make vac_truncate_clog() easy to analyze
and still somewhat concurrent, fair.
> Also, if I understand the data-loss hazard properly, it's what you
> said in the other thread: the latest_page_number could advance after
> we make our decision about what to truncate, and then maybe we could
> truncate newly-added data. We surely don't want to lock out the
> operations that can advance last_page_number, so how does serializing
> vac_truncate_clog help?
>
> I wonder whether what we need to do is add some state in shared
> memory saying "it is only safe to create pages before here", and
> make SimpleLruZeroPage or its callers check that. The truncation
> logic would advance that value, but only after completing a scan.
> (Oh ..., hmm, maybe the point of serializing truncations is to
> ensure that we know that we can safely advance that?)
vac_truncate_clog() already records "it is only safe to create pages before
here" in ShmemVariableCache->xidWrapLimit, which it updates after any unlinks.
The trouble comes when two vac_truncate_clog() run in parallel and you get a
sequence of events like this:
vac_truncate_clog() instance 1 starts, considers segment ABCD eligible to unlink
vac_truncate_clog() instance 2 starts, considers segment ABCD eligible to unlink
vac_truncate_clog() instance 1 unlinks segment ABCD
vac_truncate_clog() instance 1 calls SetTransactionIdLimit()
vac_truncate_clog() instance 1 finishes
some backend calls SimpleLruZeroPage(), creating segment ABCD
vac_truncate_clog() instance 2 unlinks segment ABCD
Serializing vac_truncate_clog() fixes that.
> Can you post whatever script you used to detect/reproduce the problem
> in the first place?
I've attached it, but it's pretty hackish. Apply the patch on commit 7170268,
make sure your --bindir is in $PATH, copy "conf-test-pg" to your home
directory, and run "make trunc-check". This incorporates xid-burn
acceleration code that Jeff Janes shared with -hackers at some point.
> PS: also, re-reading this code, it's worrisome that we are not checking
> for failure of the unlink calls. I think the idea was that it didn't
> really matter because if we did re-use an existing file we'd just
> re-zero it; hence failing the truncate is an overreaction. But probably
> some comments about that are in order.
That's my understanding. We'll zero any page before reusing it. Failing the
whole vac_truncate_clog() (and therefore not advancing the wrap limit) would
do more harm than the bit of wasted disk space. Still, a LOG or WARNING
message would be fine, I think.
Thanks,
nm
Attachment | Content-Type | Size |
---|---|---|
repro-vac_truncate_clog-race-v0.patch | text/plain | 13.9 KB |
conf-test-pg | text/plain | 832 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Fabien COELHO | 2019-08-01 06:52:52 | Re: pgbench - implement strict TPC-B benchmark |
Previous Message | Thomas Munro | 2019-08-01 06:49:49 | Re: Contribution to Perldoc for TestLib module in Postgres |