Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, rootcause000(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
Date: 2024-12-19 15:44:05
Message-ID: CA+TgmoZj9dJhHQokHioXB8dyDZz5gjzyg7SbUrG=At-uShq7yA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Dec 11, 2024 at 8:26 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> > I do wonder whether the new smgrtruncatefrom() should be used
> > everywhere instead of just from one of the call sites, but even if
> > that's desirable long-term, doing this much is still a lot better than
> > doing nothing. This data corrupting bug has been known and understood
> > for more than 4 years at this point.
>
> There are only three callers of smgrtruncate() in our tree:
>
> 1. This one.
> 2. Its redo-time counterpart.
> 3. pg_truncate_visibility_map(), in contrib/pg_visibility.
>
> The recovery environment promotes errors of both types to FATAL (the
> "we have no handler" case in errstart()), and the postmaster handles
> startup failure a bit like a PANIC. Or do you think PANIC and more
> similar code would be better?

I think mostly my thought was that if we could remove smgrtruncate()
entirely in favor of smgrtruncatefrom(), or just keep it called
smgrtruncate() but add a mandatory additional argument, that would be
less error-prone than having two versions between which future hackers
must pick.

> I am curious about the non-flushing though. RelationTruncate() also
> doesn't flush if there is no fsm or vm. Perhaps it always flushes in
> practice since VACUUM itself creates those, I'm not entirely sure yet.

I agree that we should make that unconditional. I think your analysis
shows that abandoning the WAL-before-data isn't safe, but even if
there were some doubt about whether your analysis is correct, we
should have a very strong bias against thinking that a rule as
fundamental as WAL-before-data is negotiable.

> PFA the first sketch of a patch to make pg_truncate_visibility_map()
> into... a cut-down version of RelationTruncate(). Still looking into
> the flushing, without which the comments in the patch are entirely
> bogus.

The patch looks to good to me. It seems to me that you could just add
an XLogFlush() call.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2024-12-20 02:53:43 Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
Previous Message Thomas Munro 2024-12-19 02:45:47 Re: Build failure with GCC 15 (defaults to -std=gnu23)