Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC
Date: 2024-04-12 11:14:31
Message-ID: CA+hUKGL8iy7TYgCh_RgWFiAT81MhsA5DyGP4_cWbTdc0CMm2-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Apr 12, 2024 at 6:41 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Thu, Apr 11, 2024 at 6:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> > 10.04.2024 14:00, PG Bug reporting form wrote:
> > > The following bug has been logged on the website:
> > >
> > > Bug reference: 18426
> > > ...
> > > A demo test for the issue to follow...
>
> I didn't try your test but your explanation seems clear.
> RelationTruncate() logs first, then calls smgrtruncate() which drops
> buffers and then truncates files. The dropping-the-buffers phase is
> now interruptible, since commit d87251048a0f. If you interrupt it
> there, the situation is bad: you have logged the truncation, but left
> (1) buffers and (2) untruncated files on the primary. Relation size
> being out of sync is a recipe for that PANIC next time the WAL
> mentions blocks past the (primary's) end. First thought is that that
> particular wait might need to hold interrupts. Hmm. The comments for
> RelationTruncate() contemplate but reject a critical section.
> Presumably it's waiting for another backend to flush data, and that
> other backend will eventually finish doing that or fail/crash.

That surely needs fixing, but while thinking about the difference
between holding interrupts and declaring a critical section, I'm
wondering if the lack of the latter has other pre-existing nasty
failure modes:

1. We throw away potentially dirty buffers, and then we ereport while
trying to truncate a file: now what stops some old ghost block
contents from coming back to life (read from disk in the untruncated
file)?
2. We already told downstream servers to truncate. Now the sizes are
out of sync, so what stops us logging more references to the ghost
pages and panicking replicas? (Same as this interruption issue).

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2024-04-12 11:25:00 BUG #18429: Inconsistent results on similar queries with join lateral
Previous Message Devrim Gündüz 2024-04-12 08:29:42 Re: Facing issue while installing postgres14 on rhel 9.2 machine