Re: re PG 9.6x and found xmin from before relfrozenxid and removal of pg_internal.init file(s)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Reid Thompson <Reid(dot)Thompson(at)omnicell(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: re PG 9.6x and found xmin from before relfrozenxid and removal of pg_internal.init file(s)
Date: 2020-09-28 18:29:29
Message-ID: 399166.1601317769@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Reid Thompson <Reid(dot)Thompson(at)omnicell(dot)com> writes:
> On Mon, 2020-09-28 at 12:15 -0400, Tom Lane wrote:
>> I'm a bit dubious that that'd actually help, but it's perfectly safe
>> if you want to try it. pg_internal.init is just a cache file that
>> will be rebuilt if it's missing.

> appears to allow to vacuum to complete... and stops the error messages
> to the log file.

Ah, after digging around in our git history, this symptom seems to match
this bug fix:

Author: Andres Freund <andres(at)anarazel(dot)de>
Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700
Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700
Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700
Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700
Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700
Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700

Fix bugs in vacuum of shared rels, by keeping their relcache entries current.

When vacuum processes a relation it uses the corresponding relcache
entry's relfrozenxid / relminmxid as a cutoff for when to remove
tuples etc. Unfortunately for nailed relations (i.e. critical system
catalogs) bugs could frequently lead to the corresponding relcache
entry being stale.

This set of bugs could cause actual data corruption as vacuum would
potentially not remove the correct row versions, potentially reviving
them at a later point. After 699bf7d05c some corruptions in this vein
were prevented, but the additional error checks could also trigger
spuriously. Examples of such errors are:
ERROR: found xmin ... from before relfrozenxid ...
and
ERROR: found multixact ... from before relminmxid ...
To be caused by this bug the errors have to occur on system catalog
tables.

The two bugs are:

1) Invalidations for nailed relations were ignored, based on the
theory that the relcache entry for such tables doesn't
change. Which is largely true, except for fields like relfrozenxid
etc. This means that changes to relations vacuumed in other
sessions weren't picked up by already existing sessions. Luckily
autovacuum doesn't have particularly longrunning sessions.

2) For shared *and* nailed relations, the shared relcache init file
was never invalidated while running. That means that for such
tables (e.g. pg_authid, pg_database) it's not just already existing
sessions that are affected, but even new connections are as well.
That explains why the reports usually were about pg_authid et. al.

To fix 1), revalidate the rd_rel portion of a relcache entry when
invalid. This implies a bit of extra complexity to deal with
bootstrapping, but it's not too bad. The fix for 2) is simpler,
simply always remove both the shared and local init files.

Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion:
https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de
https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com
https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com
https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com
Backpatch: 9.3-

That bug is pretty narrow, but it explains failures on pg_authid
and related catalogs. So updating to 9.6.10 or later should be
enough to prevent a recurrence.

regards, tom lane

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Rich Shepard 2020-09-28 18:42:27 Re: Modifying database schema without losing data
Previous Message Adam Scott 2020-09-28 18:14:01 Re: Modifying database schema without losing data