Re: avoid multiple hard links to same WAL file after a crash

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: avoid multiple hard links to same WAL file after a crash
Date: 2022-04-18 19:07:15
Message-ID: 2370127.1650308835@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Michael Paquier <michael(at)paquier(dot)xyz> writes:
> On Fri, Apr 08, 2022 at 09:00:36PM -0400, Robert Haas wrote:
>> I wonder if this is really true. I thought rename() was supposed to be atomic.

> Not always. For example, some old versions of MacOS have a non-atomic
> implementation of rename(), like prairiedog with 10.4. Even 10.5 does
> not handle atomicity as far as I call.

I think that's not talking about the same thing. POSIX requires rename(2)
to replace an existing target link atomically:

If the link named by the new argument exists, it shall be removed and
old renamed to new. In this case, a link named new shall remain
visible to other threads throughout the renaming operation and refer
either to the file referred to by new or old before the operation
began.

(It's that requirement that ancient macOS fails to meet.)

However, I do not see any text that addresses the question of whether
the old link disappears atomically with the appearance of the new link,
and it seems like that'd be pretty impractical to ensure in cases like
moving a link from one directory to another. (What would it even mean
to say that, considering that a thread can't read the two directories
at the same instant?) From a crash-safety standpoint, it'd surely be
better to make the new link before removing the old, so I imagine
that's what most file systems do.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-04-18 19:15:01 Re: Why does pg_class.reltuples count only live tuples in indexes (after VACUUM runs)?
Previous Message Peter Geoghegan 2022-04-18 19:04:43 Why does pg_class.reltuples count only live tuples in indexes (after VACUUM runs)?