From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | Alexander Kukushkin <cyberdemn(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Oleksandr Shulgin <oleksandr(dot)shulgin(at)zalando(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Concurrency issue in pg_rewind |
Date: | 2020-10-07 19:13:12 |
Message-ID: | 20201007191312.GB3063@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greetings,
* Heikki Linnakangas (hlinnaka(at)iki(dot)fi) wrote:
> On 18/09/2020 10:17, Alexander Kukushkin wrote:
> >At the same time, pg_rewind due to such "fatal" error leaves PGDATA in
> >an inconsistent state with empty pg_control file, this is totally bad
> >and easily fixable. We want the specific file to be absent and it is
> >already absent, why should it be a fatal error and not warning?
>
> Whenever pg_rewind runs into something unexpected, it fails loudly, so that
> the administrator can re-initialize from a base backup. That's the general
> rule. If a file goes missing while pg_rewind is running, that is unexpected.
> It could be a sign that the server was started concurrently, or another
> pg_rewind was started against it, for example.
Agreed.
> I feel that we could make an exception of some sort here, but I'm not sure
> what exactly. I don't feel comfortable just downgrading the unexpected
> ENOENT on unlink() to warning in all cases. Besides, scary warnings that you
> routinely ignore is not good either.
I also dislike the idea of downgrading this.
> I have a hard time coming up with a general rule and justification that's
> not just "do X because WAL-G does Y". pg_rewind failing because WAL-G
> removed a file unexpectedly is one problem, but another is that the
> restore_command might get confused if a pg_rewind removes a file that
> restore_command needs. This is hard when restore_command does things in the
> background, and there's no communication between the background process and
> pg_rewind.
I would also point out that wal-g isn't the only backup/restore tool
that does pre-fetching: so does pgbackrest, but we pre-fetch into an
independent spool directory, because these tools really should *not* be
modifying the PGDATA directory during restore_command.
I'm really disinclined to make concessions for external tools to start
writing into directories that they shouldn't be- and this goes for
removing .ready files too, imv. Yes, you can do such things and maybe
things will work, but if you run into issues with that, that's on you
for making changes to the PGDATA directory, not on PG to try and guess
at what you, or any other external tool, did and magically work around
it or keep it working.
Thanks,
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Emil Iggland | 2020-10-07 19:13:29 | Re: BUG #15858: could not stat file - over 4GB |
Previous Message | vignesh C | 2020-10-07 18:56:47 | Re: Parallel copy |