Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Will Mortensen <will(at)extrahop(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-bugs(at)lists(dot)postgresql(dot)org, Jacob Speidel <jacob(at)extrahop(dot)com>
Subject: Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon
Date: 2023-06-16 01:58:38
Message-ID: ZIvBzm0DwRo60Qwk@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jun 15, 2023 at 06:36:41PM -0700, Will Mortensen wrote:
> On Thu, Jun 15, 2023 at 6:01 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> It requires a few manual steps, but I have been able to stuck the
>> autovacuum launcher schedule. Nice investigation from the reporters.
>>
>> I may be missing something here, but finishing with an inconsistent
>> database list (generated based on the pgstat database entries) in the
>> autovacuum launcher is not something that can happen only because of a
>> worker, right? A normal backend would call pgstat_update_dbstats()
>> once it exists, re-creating a fresh entry with the dropped database
>> OID. Is that right?
>
> Yes, sorry, Jacob was able to repro with a normal backend just now. We
> probably should have tried that earlier. :-)

Okay, thanks for confirming. Yes, I was able to stuck the scheduler
for both. FWIW, I have initially hardcoded a stop point with an
on-disk file around InitPostgres() before we take the shared lock.
But thinking harder, it is possible to be sneaky with the following
steps to get these incorrect stat entries:
1) LOCK pg_database in a first session, on a database different than
the one to drop.
2) Connect with a second session to the database to drop, stuck
because of the previous LOCK when scanning pg_database to find the
tuple OID, during connection startup, just before taking the shared
lock.
3) Third session executing DROP DATABASE, with a session that
connected before the pg_database lock.
4) Commit transaction of first session to release pg_database lock,
the database is dropped and the second session fails to connect with
the database renamed.

> I'm also unsure if reiniting the pgstats entry (as opposed to creating
> a new one) is actually necessary or just what we happened to observe.
> We're definitely not very familiar with these internals. :-)

The proposed patch does better than that, actually, and takes the
approach of not touching the stats for the database dropped/renamed
by delaying MyDatabaseId which is what the backend uses to update the
database-level pgstat entries.
--
Michael

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Masahiko Sawada 2023-06-16 02:18:30 Re: BUG #17969: Assert failed in bloom_init() when false_positive_rate = 0.25
Previous Message Will Mortensen 2023-06-16 01:36:41 Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon