From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-committers(at)postgresql(dot)org |
Subject: | pgsql: Account for catalog snapshot in PGXACT->xmin updates. |
Date: | 2016-11-15 20:55:55 |
Message-ID: | E1c6kln-0003j1-SO@gemulon.postgresql.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
Account for catalog snapshot in PGXACT->xmin updates.
The CatalogSnapshot was not plugged into SnapshotResetXmin()'s accounting
for whether MyPgXact->xmin could be cleared or advanced. In normal
transactions this was masked by the fact that the transaction snapshot
would be older, but during backend startup and certain utility commands
it was possible to re-use the CatalogSnapshot after MyPgXact->xmin had
been cleared, meaning that recently-deleted rows could be pruned even
though this snapshot could still see them, causing unexpected catalog
lookup failures. This effect appears to be the explanation for a recent
failure on buildfarm member piculet.
To fix, add the CatalogSnapshot to the RegisteredSnapshots heap whenever
it is valid.
In the previous logic, it was possible for the CatalogSnapshot to remain
valid across waits for client input, but with this change that would mean
it delays advance of global xmin in cases where it did not before. To
avoid possibly causing new table-bloat problems with clients that sit idle
for long intervals, add code to invalidate the CatalogSnapshot before
waiting for client input. (When the backend is busy, it's unlikely that
the CatalogSnapshot would be the oldest snap for very long, so we don't
worry about forcing early invalidation of it otherwise.)
In passing, remove the CatalogSnapshotStale flag in favor of using
"CatalogSnapshot != NULL" to represent validity, as we do for the other
special snapshots in snapmgr.c. And improve some obsolete comments.
No regression test because I don't know a deterministic way to cause this
failure. But the stress test shown in the original discussion provokes
"cache lookup failed for relation 1255" within a few dozen seconds for me.
Back-patch to 9.4 where MVCC catalog scans were introduced. (Note: it's
quite easy to produce similar failures with the same test case in branches
before 9.4. But MVCC catalog scans were supposed to fix that.)
Discussion: <16447(dot)1478818294(at)sss(dot)pgh(dot)pa(dot)us>
Branch
------
REL9_4_STABLE
Details
-------
http://git.postgresql.org/pg/commitdiff/3e844a34b80355570a9cfb25becac561aee7cf82
Modified Files
--------------
src/backend/tcop/postgres.c | 6 +++
src/backend/utils/time/snapmgr.c | 101 ++++++++++++++++++++++++++++-----------
src/include/utils/snapmgr.h | 1 +
3 files changed, 79 insertions(+), 29 deletions(-)
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-11-15 21:17:43 | pgsql: Allow DOS-style line endings in ~/.pgpass files. |
Previous Message | Robert Haas | 2016-11-15 15:43:49 | pgsql: Limit the number of number of tapes used for a sort to 501. |
From | Date | Subject | |
---|---|---|---|
Next Message | Tobias Bussmann | 2016-11-15 20:57:09 | Re: Parallel execution and prepared statements |
Previous Message | Robert Haas | 2016-11-15 20:40:03 | Re: Password identifiers, protocol aging and SCRAM protocol |