RE: error "can only drop stats once" brings down database

From: Floris Van Nee <florisvannee(at)Optiver(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: RE: error "can only drop stats once" brings down database
Date: 2024-06-02 22:01:53
Message-ID: 47ca6fcc687e408287871f7eaaaf3c3b@Optiver.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> I suspect that the "high value" oids are related to the issue. I bet you're
> running into OID wraparound for objects at a much higher frequency than
> most users. I suspect the reason this causes problems is the problem I just
> found and described here:

I definitely suspect oid wraparound has something to do with it. However, given what I found on the other thread ( https://www.postgresql.org/message-id/flat/17947-b9554521ad963c9c%40postgresql.org ), I don't think the two have the exact same root cause. The other thread seems a clear case of 'forgot to call gc' when it should.

>
> The fact that we're not triggering "stats object garbage collection" for most
> drops also explains why you're much more likely to see this on a standby
> than on a primary. An a primary it's going to be rare to have a single backend
> live long enough to observe an oid wraparound leading to one backend
> accessing stats for the same object type with the same oid after that object
> previously having been dropped. But most stats accesses on a standby are
> going to be by the same process, the startup process. You'd still need some
> other accesses to prevent the object from being dropped "immediately", but
> that could be due to shorter lived processes.
>

As mentioned on other thread, I do think garbage collection gets called as part of pgstat_execute_transactional_drops in recovery.

There's still the case of the "skip gc of entry if pending" though, however I struggle to understand how this can lead to such an error in the recovery process after wraparound. I'd expect the gc to happen relatively frequently (at least much more frequently than a wraparound) as tables get created/dropped quite frequently. Next to that, because the recovery process is the only one creating/dropping relations on standby, it should be impossible to get in a state where the recovery "local" stats cache has a non-gc'd entry that should be dropped? Because when dropping, at least it removes the local cache entry (even if it cannot remove the shared one). So later, when creating it again in pgstat_get_entry_ref, it should always hit the pgstat_reinit_entry path?

Would it make sense to at least commit your patch to enhance the error message a bit?

-Floris

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2024-06-02 22:18:51 Re: Missing semicolumn in anonymous plpgsql block does not raise syntax error
Previous Message Sjors Gielen 2024-06-02 21:36:06 Re: BUG #18365: Inconsistent cost function between materialized and non-materialized CTE