Quick Links

Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC
Date:	2024-04-12 11:14:31
Message-ID:	CA+hUKGL8iy7TYgCh_RgWFiAT81MhsA5DyGP4_cWbTdc0CMm2-g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Fri, Apr 12, 2024 at 6:41 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Thu, Apr 11, 2024 at 6:00 AM Alexander Lakhin <exclusion(at)gmail(dot)com> wrote:
> > 10.04.2024 14:00, PG Bug reporting form wrote:
> > > The following bug has been logged on the website:
> > >
> > > Bug reference: 18426
> > > ...
> > > A demo test for the issue to follow...
>
> I didn't try your test but your explanation seems clear.
> RelationTruncate() logs first, then calls smgrtruncate() which drops
> buffers and then truncates files. The dropping-the-buffers phase is
> now interruptible, since commit d87251048a0f. If you interrupt it
> there, the situation is bad: you have logged the truncation, but left
> (1) buffers and (2) untruncated files on the primary. Relation size
> being out of sync is a recipe for that PANIC next time the WAL
> mentions blocks past the (primary's) end. First thought is that that
> particular wait might need to hold interrupts. Hmm. The comments for
> RelationTruncate() contemplate but reject a critical section.
> Presumably it's waiting for another backend to flush data, and that
> other backend will eventually finish doing that or fail/crash.

That surely needs fixing, but while thinking about the difference
between holding interrupts and declaring a critical section, I'm
wondering if the lack of the latter has other pre-existing nasty
failure modes:

1. We throw away potentially dirty buffers, and then we ereport while
trying to truncate a file: now what stops some old ghost block
contents from coming back to life (read from disk in the untruncated
file)?
2. We already told downstream servers to truncate. Now the sizes are
out of sync, so what stops us logging more references to the ghost
pages and panicking replicas? (Same as this interruption issue).

In response to

Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC at 2024-04-12 06:41:30 from Thomas Munro

Responses

Re: BUG #18426: Canceling vacuum while truncating a relation leads to standby PANIC at 2024-04-23 20:26:09 from Thomas Munro

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	PG Bug reporting form	2024-04-12 11:25:00	BUG #18429: Inconsistent results on similar queries with join lateral
Previous Message	Devrim Gündüz	2024-04-12 08:29:42	Re: Facing issue while installing postgres14 on rhel 9.2 machine