Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Alexander Lakhin <exclusion(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, rootcause000(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
Date: 2024-09-09 07:21:19
Message-ID: Zt6h7y308oXttagv@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Sep 06, 2024 at 11:05:26AM +1200, Thomas Munro wrote:
> But let's not forget that these patches also fix two bugs that apply
> to Unix too.
>
> For the DELAY_CHKPT_START bug, we should back-patch all the way.
>
> For the WaitIO() bug affecting all OSes, that only needs to go back to
> 14. We could also opt to be cautious and let it run on master for a
> while before we do that, though.
>
> For ftruncate() failure, I wouldn't be too bothered if we just let
> sleeping dogs lie in 13. It affects only systems that have serious
> file system corruption, or Windows systems that have something
> snooping on private files with antisocial flags.

0001 looks OK seen from here.

+GetRelationPathInPlace(char *path,
+ Oid dbOid, Oid spcOid, RelFileNumber relNumber,

It's very easy to handle such APIs the wrong way. I'd suggest to also
give the size of the output buffer as an argument of the function,
cross-checking in the function that nothing is wrong with the path
generated.

The trick to use _mdfd_segpath() through smgrpreparetruncate() smells
too much of magic to me. I don't have an idea that does not involve
tweaking the interface of smgr.c or some of its structures on top of
my mind, though, which is to save the path before the critical section
and pass it through (like a private memory area that can be used by
md.c).. Or do a MemoryContextAllowInCriticalSection, even if it is
not its original purpose.

Also, now that we have injection points, could this stuff be worth
adding a test? You could force an error state that gets upgraded to a
PANIC while doing the truncation, checking that we don't lose data
after recovery, for example. This is a problem old enough that I'd
rather see some automation in place with a long-term picture in mind,
even if it means a test for 17~ or just HEAD.
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2024-09-09 07:40:41 BUG #18606: syntax error at or near "ROWS"
Previous Message PG Bug reporting form 2024-09-09 05:04:48 BUG #18605: cursor select not working