From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Spurious "apparent wraparound" via SimpleLruTruncate() rounding |
Date: | 2019-02-02 08:38:22 |
Message-ID: | 20190202083822.GC32531@gust.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
While testing an xidStopLimit corner case, I got this:
3656710 2019-01-05 00:05:13.910 GMT LOG: automatic aggressive vacuum to prevent wraparound of table "test.pg_toast.pg_toast_826": index scans: 0
3656710 2019-01-05 00:05:16.912 GMT LOG: could not truncate directory "pg_xact": apparent wraparound
3656710 2019-01-05 00:05:16.912 GMT DEBUG: transaction ID wrap limit is 4294486400, limited by database with OID 1
3656710 2019-01-05 00:05:16.912 GMT WARNING: database "template1" must be vacuumed within 481499 transactions
3656710 2019-01-05 00:05:16.912 GMT HINT: To avoid a database shutdown, execute a database-wide VACUUM in that database.
I think the WARNING was correct about having 481499 XIDs left before
xidWrapLimit, and the spurious "apparent wraparound" arose from this
rounding-down in SimpleLruTruncate():
cutoffPage -= cutoffPage % SLRU_PAGES_PER_SEGMENT;
...
/*
* While we are holding the lock, make an important safety check: the
* planned cutoff point must be <= the current endpoint page. Otherwise we
* have already wrapped around, and proceeding with the truncation would
* risk removing the current segment.
*/
if (ctl->PagePrecedes(shared->latest_page_number, cutoffPage))
{
LWLockRelease(shared->ControlLock);
ereport(LOG,
(errmsg("could not truncate directory \"%s\": apparent wraparound",
ctl->Dir)));
We round "cutoffPage" to make ctl->PagePrecedes(segpage, cutoffPage) return
false for the segment containing the cutoff page. CLOGPagePrecedes() (and
most SLRU PagePrecedes methods) implements a circular address space. Hence,
the rounding also causes ctl->PagePrecedes(segpage, cutoffPage) to return true
for the segment furthest in the future relative to the unrounded cutoffPage
(if it exists). That's bad. Such a segment rarely exists, because
xidStopLimit protects 1000000 XIDs, and the rounding moves truncation by no
more than (BLCKSZ * CLOG_XACTS_PER_BYTE * SLRU_PAGES_PER_SEGMENT - 1) =
1048575 XIDs. Thus, I expect to see this problem at 4.9% of xidStopLimit
values. I expect this is easier to see with multiStopLimit, which protects
only 100 mxid.
The main consequence is the false alarm. A prudent DBA will want to react to
true wraparound, but no such wraparound has occurred. Also, we temporarily
waste disk space in pg_xact. This feels like a recipe for future bugs. The
fix I have in mind, attached, is to change instances of
ctl->PagePrecedes(FIRST_PAGE_OF_SEGMENT, ROUNDED_cutoffPage) to
ctl->PagePrecedes(LAST_PAGE_OF_SEGMENT, cutoffPage). I'm inclined not to
back-patch this; does anyone favor back-patching?
Thanks,
nm
Attachment | Content-Type | Size |
---|---|---|
slru-truncate-modulo-v1.patch | text/x-diff | 1.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Graham Leggett | 2019-02-02 11:34:56 | Re: DNS SRV support for LDAP authentication |
Previous Message | Fabien COELHO | 2019-02-02 08:26:08 | Re: pg_dump multi VALUES INSERT |