From: | "Bossart, Nathan" <bossartn(at)amazon(dot)com> |
---|---|
To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | race condition when writing pg_control |
Date: | 2020-05-04 17:44:21 |
Message-ID: | 70BF24D6-DC51-443F-B55A-95735803842A@amazon.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi hackers,
I believe I've discovered a race condition between the startup and
checkpointer processes that can cause a CRC mismatch in the pg_control
file. If a cluster crashes at the right time, the following error
appears when you attempt to restart it:
FATAL: incorrect checksum in control file
This appears to be caused by some code paths in xlog_redo() that
update ControlFile without taking the ControlFileLock. The attached
patch seems to be sufficient to prevent the CRC mismatch in the
control file, but perhaps this is a symptom of a bigger problem with
concurrent modifications of ControlFile->checkPointCopy.nextFullXid.
Nathan
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Prevent-race-condition-when-writing-pg_control.patch | application/octet-stream | 1.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2020-05-04 18:04:32 | Re: design for parallel backup |
Previous Message | Tom Lane | 2020-05-04 15:28:37 | Re: do {} while (0) nitpick |