Re: datfrozenxid not dropping after vacuum

From: Matthew Tice <mjtice(at)gmail(dot)com>
To:
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: datfrozenxid not dropping after vacuum
Date: 2021-09-01 22:01:28
Message-ID: CA+taBv9kNz-Boi8Zvp4REc+1w8k00EGbSC6j3ATOUWeV5SdPcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hi Alvaro, thanks for the quick reply.

I'm scheduled to do my patching maintenance at the end of this month -
but at this point I don't think I'm going to make it.

Other than patching, is there a work around? For example, in #2 above:
>The fix for 2) is simpler,
> simply always remove both the shared and local init files.

I'm not familiar with the differences between 'shared' and 'local'
init files (I'd imagine I referenced a 'local' file in my original
post)?

Thanks!

Matt

On Wed, Sep 1, 2021 at 3:00 PM Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2021-Sep-01, Matthew Tice wrote:
>
> [ problem table is pg_database ]
>
> > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my
> > standby databases have been patched to 10.17.
>
> Hmm, I think there was a bug in the early 10.x versions where advancing
> the xid age of shared tables would not work correctly for some reason ...
> Ah yes, this was fixed in 10.5, a mere three years ago:
>
> Author: Andres Freund <andres(at)anarazel(dot)de>
> Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700
> Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700
> Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700
> Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700
> Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700
> Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700
>
> Fix bugs in vacuum of shared rels, by keeping their relcache entries current.
>
> When vacuum processes a relation it uses the corresponding relcache
> entry's relfrozenxid / relminmxid as a cutoff for when to remove
> tuples etc. Unfortunately for nailed relations (i.e. critical system
> catalogs) bugs could frequently lead to the corresponding relcache
> entry being stale.
>
> This set of bugs could cause actual data corruption as vacuum would
> potentially not remove the correct row versions, potentially reviving
> them at a later point. After 699bf7d05c some corruptions in this vein
> were prevented, but the additional error checks could also trigger
> spuriously. Examples of such errors are:
> ERROR: found xmin ... from before relfrozenxid ...
> and
> ERROR: found multixact ... from before relminmxid ...
> To be caused by this bug the errors have to occur on system catalog
> tables.
>
> The two bugs are:
>
> 1) Invalidations for nailed relations were ignored, based on the
> theory that the relcache entry for such tables doesn't
> change. Which is largely true, except for fields like relfrozenxid
> etc. This means that changes to relations vacuumed in other
> sessions weren't picked up by already existing sessions. Luckily
> autovacuum doesn't have particularly longrunning sessions.
>
> 2) For shared *and* nailed relations, the shared relcache init file
> was never invalidated while running. That means that for such
> tables (e.g. pg_authid, pg_database) it's not just already existing
> sessions that are affected, but even new connections are as well.
> That explains why the reports usually were about pg_authid et. al.
>
> To fix 1), revalidate the rd_rel portion of a relcache entry when
> invalid. This implies a bit of extra complexity to deal with
> bootstrapping, but it's not too bad. The fix for 2) is simpler,
> simply always remove both the shared and local init files.
>
> Author: Andres Freund
> Reviewed-By: Alvaro Herrera
> Discussion:
> https://postgr.es/m/20180525203736.crkbg36muzxrjj5e@alap3.anarazel.de
> https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bPgG+_GDMxe25TvUY4Sugg@mail.gmail.com
> https://postgr.es/m/CAKMFJucqbuoDRfxPDX39WhA3vJyxweRg_zDVXzncr6+5wOguWA@mail.gmail.com
> https://postgr.es/m/CAGewt-ujGpMLQ09gXcUFMZaZsGJC98VXHEFbF-tpPB0fB13K+A@mail.gmail.com
> Backpatch: 9.3-
>
>
> --
> Álvaro Herrera 39°49'30"S 73°17'W
> "El número de instalaciones de UNIX se ha elevado a 10,
> y se espera que este número aumente" (UPM, 1972)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-09-02 02:40:33 Re: oldest WAL files not removed
Previous Message Alvaro Herrera 2021-09-01 21:00:29 Re: datfrozenxid not dropping after vacuum