Re: Back-patch of: avoid multiple hard links to same WAL file after a crash

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Robert Pang <robertpang(at)google(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Back-patch of: avoid multiple hard links to same WAL file after a crash
Date: 2024-12-19 05:44:53
Message-ID: Z2Oy1Z2nMVmTM5L5@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 18, 2024 at 08:51:20PM -0500, Andres Freund wrote:
> I don't think the issue is actually quite as unlikely to be hit as reasoned in
> the commit message. The crash has indeed to happen between the link() and
> unlink() - but at the end of a checkpoint we do that operations hundreds of
> times in a row on a busy server. And that's just after potentially doing lots
> of write IO during a checkpoint, filling up drive write caches / eating up
> IOPS/bandwidth disk quots.

Looks so, yep. Your timing and the report's timing are interesting.

I've been double-checking the code to refresh myself with the problem,
and I don't see a reason to not apply something like the attached set
down to v13 for all these remaining branches (minus an edit of the
commit message).

Thoughts?
--
Michael

Attachment Content-Type Size
0001-Replace-durable_rename_excl-by-durable_rename-ta-v15.patch text/x-diff 6.7 KB
0001-Replace-durable_rename_excl-by-durable_rename-ta-v14.patch text/x-diff 5.9 KB
0001-Replace-durable_rename_excl-by-durable_rename-ta-v13.patch text/x-diff 5.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2024-12-19 05:49:38 Re: Fix for pageinspect bug in PG 17
Previous Message Michael Paquier 2024-12-19 04:21:54 Re: per backend I/O statistics