From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Daniel Gustafsson <daniel(at)yesql(dot)se>, "Anton A(dot) Melnikov" <aamelnikov(at)inbox(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" |
Date: | 2023-07-25 01:36:03 |
Message-ID: | CA+hUKG+a+M6tbKJ5Ei2SFBDJxw4UjGLyRBDVrUfuSBZZ0ht0LQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jul 25, 2023 at 8:18 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> (Yeah, I know we have code to verify checksums during a base
> backup, but as discussed elsewhere, it doesn't work.)
BTW the the code you are referring to there seems to think 4KB
page-halves are atomic; not sure if that's imagining page-level
locking in ancient Linux (?), or imagining default setvbuf() buffer
size observed with some specific implementation of fread(), or
confusing power-failure-sector-based atomicity with concurrent access
atomicity, or something else, but for the record what we actually see
in this scenario on ext4 is the old/new page contents mashed together
on much smaller boundaries (maybe cache lines), caused by duelling
concurrent memcpy() to/from, independent of any buffer/page-level
implementation details we might have been thinking of with that code.
Makes me wonder if it's even technically sound to examine the LSN.
> It's also why we
> have to force full-page write on during a backup. But the whole thing
> is nasty because you can't really verify anything about the backup you
> just took. It may be full of gibberish blocks but don't worry because,
> if all goes well, recovery will fix it. But you won't really know
> whether recovery actually does fix it. You just kind of have to cross
> your fingers and hope.
Well, not without also scanning the WAL for FPIs, anyway... And
conceptually, that's why I think we probably want an 'FPI' of the
control file somewhere.
From | Date | Subject | |
---|---|---|---|
Next Message | Mr.Bim | 2023-07-25 03:17:55 | Partition pruning not working on updates |
Previous Message | Peter Geoghegan | 2023-07-25 01:33:52 | Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan |