From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: SimpleLruTruncate() mutual exclusion |
Date: | 2019-06-28 17:06:28 |
Message-ID: | 20190628170628.GA1238361@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Feb 17, 2019 at 11:31:03PM -0800, Noah Misch wrote:
> I'm forking this thread from
> https://postgr.es/m/flat/20190202083822(dot)GC32531(at)gust(dot)leadboat(dot)com, which
> reported a race condition involving the "apparent wraparound" safety check in
> SimpleLruTruncate():
>
> On Wed, Feb 13, 2019 at 11:26:23PM -0800, Noah Misch wrote:
> > 1. The result of the test is valid only until we release the SLRU ControlLock,
> > which we do before SlruScanDirCbDeleteCutoff() uses the cutoff to evaluate
> > segments for deletion. Once we release that lock, latest_page_number can
> > advance. This creates a TOCTOU race condition, allowing excess deletion:
> >
> > [local] test=# table trunc_clog_concurrency ;
> > ERROR: could not access status of transaction 2149484247
> > DETAIL: Could not open file "pg_xact/0801": No such file or directory.
>
> > Fixes are available:
>
> > b. Arrange so only one backend runs vac_truncate_clog() at a time. Other than
> > AsyncCtl, every SLRU truncation appears in vac_truncate_clog(), in a
> > checkpoint, or in the startup process. Hence, also arrange for only one
> > backend to call SimpleLruTruncate(AsyncCtl) at a time.
>
> More specifically, restrict vac_update_datfrozenxid() to one backend per
> database, and restrict vac_truncate_clog() and asyncQueueAdvanceTail() to one
> backend per cluster. This, attached, was rather straightforward.
Rebased. The conflicts were limited to comments and documentation.
> I wonder about performance in a database with millions of small relations,
> particularly considering my intent to back-patch this. In such databases,
> vac_update_datfrozenxid() can be a major part of the VACUUM's cost. Two
> things work in our favor. First, vac_update_datfrozenxid() runs once per
> VACUUM command, not once per relation. Second, Autovacuum has this logic:
>
> * ... we skip
> * this if (1) we found no work to do and (2) we skipped at least one
> * table due to concurrent autovacuum activity. In that case, the other
> * worker has already done it, or will do so when it finishes.
> */
> if (did_vacuum || !found_concurrent_worker)
> vac_update_datfrozenxid();
>
> That makes me relatively unworried. I did consider some alternatives:
>
> - Run vac_update_datfrozenxid()'s pg_class scan before taking a lock. If we
> find the need for pg_database updates, take the lock and scan pg_class again
> to get final numbers. This doubles the work in cases that end up taking the
> lock, so I'm not betting it being a net win.
>
> - Use LockWaiterCount() to skip vac_update_datfrozenxid() if some other
> backend is already waiting. This is similar to Autovacuum's
> found_concurrent_worker test. It is tempting. I'm not proposing it,
> because it changes the states possible when manual VACUUM completes. Today,
> you can predict post-VACUUM datfrozenxid from post-VACUUM relfrozenxid
> values. If manual VACUUM could skip vac_update_datfrozenxid() this way,
> datfrozenxid could lag until some concurrent VACUUM finishes.
Attachment | Content-Type | Size |
---|---|---|
wrap-limits-mutex-v2.patch | text/plain | 13.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-06-28 17:15:04 | Re: Commitfest 2019-07, the first of five* for PostgreSQL 13 |
Previous Message | Julien Rouhaud | 2019-06-28 16:43:40 | Re: Avoid full GIN index scan when possible |