From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | InstallXLogFileSegment() vs concurrent WAL flush |
Date: | 2024-02-02 10:18:18 |
Message-ID: | CA+hUKGLO02j2WLiQ73iZ+CEY1G+LPmHo3PXaYTaFY9Hj222mEQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
New WAL space is created by renaming a file into place. Either a
newly created file with a temporary name or, ideally, a recyclable old
file with a name derived from an old LSN. I think there is a data
loss window between rename() and fsync(parent_directory). A
concurrent backend might open(new_name), write(), fdatasync(), and
then we might lose power before the rename hits the disk. The data
itself would survive the crash, but recovery wouldn't be able to find
and replay it. That might break the log-before-data rule or forget a
transaction that has been reported as committed to a client.
Actual breakage would presumably require really bad luck, and I
haven't seen this happen or anything, it just occurred to me while
reading code, and I can't see any existing defences.
One simple way to address that would be to make XLogFileInitInternal()
wait for InstallXLogFileSegment() to finish. It's a little
pessimistic to do that unconditionally, though, as then you have to
wait even for rename operations for segment files later than the one
you're opening, so I thought about how to skip waiting in that case --
see 0002. I'm not sure if it's worth worrying about or not.
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-InstallXLogFileSegment-concurrency-bug.patch | application/octet-stream | 1.6 KB |
0002-Track-end-of-installed-WAL-space-in-shared-memory.patch | application/octet-stream | 2.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2024-02-02 10:39:25 | Re: An improvement on parallel DISTINCT |
Previous Message | Bertrand Drouvot | 2024-02-02 10:15:13 | Re: Synchronizing slots from primary to standby |