| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
| Cc: | Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" |
| Date: | 2022-11-23 22:05:03 |
| Message-ID: | 3748783.1669241103@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Thomas Munro <thomas(dot)munro(at)gmail(dot)com> writes:
> On Wed, Nov 23, 2022 at 11:03 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>> I assume this is ext4. Presumably anything that reads the
>> controlfile, like pg_ctl, pg_checksums, pg_resetwal,
>> pg_control_system(), ... by reading without interlocking against
>> writes could see garbage. I have lost track of the versions and the
>> thread, but I worked out at some point by experimentation that this
>> only started relatively recently for concurrent read() and write(),
>> but always happened with concurrent pread() and pwrite(). The control
>> file uses the non-p variants which didn't mash old/new data like
>> grated cheese under concurrency due to some implementation detail, but
>> now does.
Ugh.
> As for what to do about it, some ideas:
> 2. Retry after a short time on checksum failure. The probability is
> already miniscule, and becomes pretty close to 0 if we read thrice
> 100ms apart.
> First thought is that 2 is appropriate level of complexity for this
> rare and stupid problem.
Yeah, I was thinking the same. A variant could be "repeat until
we see the same calculated checksum twice".
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Cary Huang | 2022-11-23 22:29:56 | Re: Patch: Global Unique Index |
| Previous Message | Tom Lane | 2022-11-23 21:59:48 | Re: More efficient build farm animal wakeup? |