Re: avoid multiple hard links to same WAL file after a crash

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us, robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: avoid multiple hard links to same WAL file after a crash
Date: 2022-05-02 17:36:57
Message-ID: 20220502173657.GA3351307@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 01, 2022 at 10:08:53PM +0900, Michael Paquier wrote:
> At the end, switching directly from durable_rename_excl() to
> durable_rename() should be fine for the WAL segment initialization,
> but we could do things a bit more carefully by adding a check on the
> file existence before calling durable_rename() and issue a elog(LOG)
> if a file is found, giving a mean for the WAL recycling to give up
> peacefully as it does now. Per my analysis, the TLI history file
> created at the end of recovery ought to issue an elog(ERROR).

My only concern with this approach is that it inevitably introduces a race
condition. In most cases, the file existence check will prevent
overwrites, but it might not always. Furthermore, we believe that such
overwrites either 1) should not happen (e.g., WAL recycling) or 2) won't
cause problems if they happen (e.g., when the WAL receiver writes the TLI
history file). Also, these races will be difficult to test, so we won't
know what breaks when they occur.

My instinct is to just let the overwrites happen. That way, we are more
likely to catch breakage in tests, and we'll have one less race condition
to worry about. I don't mind asserting that the file doesn't exist when we
don't expect it to, as that might help catch potential problems in
development without affecting behavior in production. If we do want to
add file existence checks, I think we'd better add a comment about the
potential for race conditions.

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2022-05-02 17:39:07 Re: avoid multiple hard links to same WAL file after a crash
Previous Message Tomas Vondra 2022-05-02 17:36:54 Re: bogus: logical replication rows/cols combinations