From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Nathan Bossart <nathandbossart(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Peter Geoghegan <pg(at)bowt(dot)ie>, David Rowley <dgrowley(at)gmail(dot)com>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Subject: | Re: Trigger more frequent autovacuums of heavy insert tables |
Date: | 2025-02-25 16:02:40 |
Message-ID: | CAAKRu_Y4BPM2it4NaTLx+8OdYL2v-jLNzgN8e3eqkCA6DLBRbQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Feb 25, 2025 at 10:39 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Mon, Feb 24, 2025 at 6:35 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > I'm on the fence about adding a PANIC. We do PANIC in other places
> > where we notice corruption (like PageAddItemExtended()). But, in most
> > of the cases, it seems like we are PANICing because there isn't a
> > reasonable way to accomplish the intended task. In this case, we
> > probably can't trust the visibility map counts for that page, but the
> > pg_class columns are just estimates, so just capping relallfrozen
> > might be good enough.
>
> +1 for not unnecessarily inflating the log level. I agree with the
> principle you articulate here: a PANIC is reasonable when there's no
> sane way to continue, but for other corruption events, a WARNING is
> better. Note that one really bad consequence of using ERROR or any
> higher level in VACUUM is that VACUUM doesn't ever complete. We just
> keep retrying and dying. Even if it's only an ERROR, now we're no
> longer controlling bloat or preventing wraparound. That's extremely
> painful for users, so we don't want to end up there if we have a
> reasonable alternative.
That makes sense. I think I'll add a WARNING if we don't cap the value.
> Also, I think that it's usually a bad idea to use an Assert to test
> for data corruption scenarios. The problem is that programmers are
> more or less entitled to assume that an assertion will always hold
> true, barring bugs, and just ignore completely what the code might do
> if the assertion turns out to be false. But data does get corrupted on
> disk, so you SHOULD think about what the code is going to do in such
> scenarios. Depending on the details, you might want to WARNING or
> ERROR or PANIC or try to repair it or try to just tolerate it being
> wrong or something else -- but you should be thinking about what the
> code is actually going to do, not just going "eh, well, that can't
> happen".
I agree with this. I don't see what the assertion would catch in this
case. Even as I try to think of a scenario where this would help the
developer, if you write code that corrupts the visibility map, I don't
think waiting for an assert at the end of a vacuum will be your first
or best signal.
This does however leave me with the question of how to handle the
original question of whether or not to cap the proposed relallfrozen
to the value of relallvisible when updating stats at the end of
vacuum. The current code in heap_vacuum_rel() caps relallvisible to
relpages, so capping relallfrozen to relallvisible would follow that
pattern. However, the other places relallvisible is updated do no such
capping (do_analyze_rel(), index_update_stats()). It doesn't seem like
there is a good reason to do it one place and not the others. So, I
suggest either removing all the caps and adding a WARNING or capping
the value in all places. Because users can now manually update these
values in pg_class, there wouldn't be a way to detect the difference
between a bogus relallfrozen value due to VM corruption or a bogus
value due to manual statistics intervention. This led me to think that
a WARNING and no cap would be more effective for heap_vacuum_rel().
- Melanie
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2025-02-25 16:06:59 | Re: Trigger more frequent autovacuums of heavy insert tables |
Previous Message | Ramanathan | 2025-02-25 15:56:13 | Re: Proposal - Reduce lock during first phase of VACUUM TRUNCATE from ACCESS EXCLUSIVE to EXCLUSIVE |