Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: David Steele <david(at)pgmasters(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Stephen Frost <sfrost(at)snowman(dot)net>, Daniel Gustafsson <daniel(at)yesql(dot)se>, "Anton A(dot) Melnikov" <aamelnikov(at)inbox(dot)ru>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"
Date: 2023-10-17 17:47:23
Message-ID: CA+TgmoaUEj4s2Jt7W5uuW0FBry5Ff4jT=uKhO2DsDYUBnUfJtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 16, 2023 at 6:48 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> I pushed the retry-loop-in-frontend-executables patch and the
> missing-locking-in-SQL-functions patch yesterday. That leaves the
> backup ones, which I've rebased and attached, no change. It sounds
> like we need some more healthy debate about that backup label idea
> that would mean we don't need these (two birds with one stone), so
> I'll just leave these here to keep the cfbot happy in the meantime.

0002 has no comments at all, and the commit message is not specific
enough for me to understand what problem it fixes. I suggest adding
some comments and fixing the commit message. I'm also not very sure
whether the change to the signature of sendFileWithContent is really
the best way to deal with the control file maybe containing a zero
byte ... but I'm quite sure that if we're going to do it that way, it
needs a comment. But maybe we should do something else that would
require less explanation, like having the caller always pass the
length.

Regarding 0001, the way you've hacked up update_controlfile() doesn't
fill me with joy. It's nice if code that is common to the frontend and
the backend does the same thing in both cases rather than, say, having
an extra argument that only works in one case but not the other. I bet
this could be refactored to make it nicer, e.g. have one function that
takes an exact pathname at which the control file is to be written and
then other functions that use it as a subroutine.

Personally, I think the general idea of 0001 is better than any
competing proposal on the table. In the case of pg_basebackup, we
could fix the server to perform appropriate locking around reading the
control file, so that the version sent to the client doesn't get torn.
But if a backup is made by calling pg_backup_start() and copying
$PGDATA, that isn't going to work. To fix that, we need to either make
the backup procedure more complicated and essentially treat the
control file as a special case, or we need to do something like this.
I think this is better; but as you mention, opinions vary on that.

Life would be a lot easier here if we could get rid of the low-level
backup API and just have pg_basebackup DTWT, but that seems like a
completely non-viable proposal.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-10-17 17:57:22 Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE
Previous Message Maciek Sakrejda 2023-10-17 16:53:03 Re: run pgindent on a regular basis / scripted manner