From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Daniel Gustafsson <daniel(at)yesql(dot)se>, "Anton A(dot) Melnikov" <aamelnikov(at)inbox(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt" |
Date: | 2023-07-24 20:17:56 |
Message-ID: | CA+TgmoZsOzpiH9Zc4LOnuON1GXM+x830g-jCbyhp-ZXeUo_0kg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jul 21, 2023 at 8:52 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> Idea for future research: Perhaps pg_backup_stop()'s label-file
> output should include the control file image (suitably encoded)? Then
> the recovery-from-label code could completely ignore the existing
> control file, and overwrite it using that copy. It's already
> partially ignoring it, by using the label file's checkpoint LSN
> instead of the control file's. Perhaps the captured copy could
> include the correct LSN already, simplifying that code, and the low
> level backup procedure would not need any additional steps or caveats.
> No more atomicity problem for low-level-backups... but probably not
> something we would back-patch, for such a rare failure mode.
I don't really know what the solution is, but this is a general
problem with the low-level backup API, and I think it sucks pretty
hard. Here, we're talking about the control file, but the same problem
exists with the data files. We try to work around that but it's all
hacks. Unless your backup tool has special magic powers of some kind,
you can't take a backup using either pg_basebackup or the low-level
API and then check that individual blocks have valid checksums, or
that they have sensible, interpretable contents, because they might
not. (Yeah, I know we have code to verify checksums during a base
backup, but as discussed elsewhere, it doesn't work.) It's also why we
have to force full-page write on during a backup. But the whole thing
is nasty because you can't really verify anything about the backup you
just took. It may be full of gibberish blocks but don't worry because,
if all goes well, recovery will fix it. But you won't really know
whether recovery actually does fix it. You just kind of have to cross
your fingers and hope.
It's unclear to me how we could do better, especially when using the
low-level API. BASE_BACKUP could read via shared_buffers instead of
the FS, and I think that might be a good idea if we can defend
adequately against cache poisoning, but with the low-level API someone
may just be calling a FS-level snapshot primitive. Unless we're
prepared to pause all writes while that happens, I don't know how to
do better.
--
Robert Haas
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2023-07-24 22:04:27 | Re: Avoid unused value (src/fe_utils/print.c) |
Previous Message | Greg Sabino Mullane | 2023-07-24 20:09:23 | Improve pg_stat_statements by making jumble handle savepoint names better |