Re: error "can only drop stats once" brings down database

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Floris Van Nee <florisvannee(at)optiver(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, "tgl(at)sss(dot)pgh(dot)pa(dot)us" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: error "can only drop stats once" brings down database
Date: 2024-06-10 06:37:24
Message-ID: ZmafJO3lpMjbyHXv@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, Jun 08, 2024 at 11:52:43AM +0000, Floris Van Nee wrote:
> I've got an update about the bug. I managed to reproduce it locally
> after a lot of digging.
>
> How to repro:
> - Setup primary + replica
> - Open a psql session on both
> - On primary session: create table t (a int); select 't'::regclass::oid;
> - On replica session: select * from t;
> - On primary session: drop table t; vacuum pg_class; checkpoint;
> - Gdb attach to the backend for your primary, set a breakpoint for
> catalog.c:GetNewOidWithIndex, just before it calls GetNewObjectId()
> - On primary session: create table t (a int);
> - When it hits breakpoint, simulate oid wraparound by setting:
> ShmemVariableCache->nextOid = <the output value of the select earlier>
> This will make pg create the new table with the same oid as the previous one.
> - On primary session: drop table t; -- this triggers the replica to go down

Okay, this stuff makes the beginning of a week fun.

> The reason it crashes on replica is that the recovery process is responsible for dropping
> stats on commit, but it's not creating them on table creation. Thus, on the second create
> table call, the old shared stats entry still exists (due to a backend still have a ref to it),
> but it is never reinitialized by the logic in pgstat_reinit_entry(). On primary it's not possible
> to reach this state, because heap_create() creates the stats entry immediately when the
> table is created.
>
> I wonder what's the best way to fix this though.
> Should redo process call pgstat_create_relation somewhere, just like heap_create does?
> Should we just ignore this 'drop stats twice' error on standby?

Nah, ignoring the double-drop error does not seem right to me.
Wouldn't it make the most sense to ensure that the stats are dropped
on the standby instead on the first DROP replayed even if there are
still references to it hold, making sure that the stats entry with
this OID is gone before reusing it after wraparound?
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2024-06-10 06:43:27 Re: BUG #18483: Segmentation fault in tests modules
Previous Message Masahiko Sawada 2024-06-10 06:23:27 Re: BUG #18483: Segmentation fault in tests modules