Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, rootcause000(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
Date: 2024-09-05 23:05:26
Message-ID: CA+hUKG+g8ydXzSnHQPtNhmwNhn8A-FborSZGSLg62tivaugP0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jun 27, 2024 at 8:59 AM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> On 26/06/2024 23:58, Heikki Linnakangas wrote:
> > On 14/05/2024 16:00, Alexander Lakhin wrote:
> >> 23.04.2024 10:48, Thomas Munro wrote:
> >>> Here is a new attempt to see what it might take to put
> >>> RelationTruncate() into a critical section.
> >>
> >> When running 027_stream_regress on a slow machine with the aggressive
> >> autovacuum settings, having those patches applied, I've stumbled upon:
> >> TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "mcxt.c", Line: 1353, PID: 24468

> > This should've also been fixed by commit b1ffe3ff0b.
>
> To clarify commit b1ffe3ff0b only fixed this assertion failure, if you
> call RelationTruncate in a critical section, like in with this patch.
> Not the original issue.

Thanks Alexander and Heikki.

I have rebased these patches over c7cd2d6e, which introduced the
PG_TBLSPC_DIR macro used in path construction. I added them to the
commitfest so we don't lose track of them, and applied two changes
requested by Michael upthread: a thinko in a commit message, and I
split the GetRelationPathInPlace() function into its own patch.

Andres and Noah are discussing new ways to solve the
can't-call-palloc-in-critical-section problem[1], but if we want any
chance to be able to back-patch *this* fix, then I think we need to
invent GetRelationPathInPlace() anyway, no?

I realised that my earlier speculation about Windows'
ERROR_USER_MAPPED_FILE was bogus, because that'd be translated to
EINVAL, and here we have EACCES ("Permission denied"). I had been
looking for explanations for just ftruncate() on its own to fail,
thinking that the file was already open, but I had forgotten that
FileTruncate() might need to reopen the file in the vfd layer. That
doesn't require any exotic new explanations: it'd fail like that if
programs unknown had opened the file without the FILE_SHARE_XXX flags,
and our pgwin32_open() kludge failed to open the file after 50 sleep
retry loops.

But let's not forget that these patches also fix two bugs that apply
to Unix too.

For the DELAY_CHKPT_START bug, we should back-patch all the way.

For the WaitIO() bug affecting all OSes, that only needs to go back to
14. We could also opt to be cautious and let it run on master for a
while before we do that, though.

For ftruncate() failure, I wouldn't be too bothered if we just let
sleeping dogs lie in 13. It affects only systems that have serious
file system corruption, or Windows systems that have something
snooping on private files with antisocial flags.

[1] https://www.postgresql.org/message-id/flat/h3a7ftrxypgxbw6ukcrrkspjon5dlninedwb5udkrase3rgqvn%403cokde6btlrl

Attachment Content-Type Size
v3-0001-RelationTruncate-must-set-DELAY_CHKPT_START.patch text/x-patch 3.9 KB
v3-0002-Introduce-GetRelationPathInPlace.patch text/x-patch 4.9 KB
v3-0003-RelationTruncate-must-use-a-critical-section.patch text/x-patch 11.9 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Karim Chaid 2024-09-06 02:31:44 Re: BUG #18599: server closed the connection unexpectedly
Previous Message Haifang Wang (Centific Technologies Inc) 2024-09-05 17:18:22 RE: [EXTERNAL] Re: Windows Application Issues | PostgreSQL | REF # 48475607