From: | Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru> |
---|---|
To: | Alexander Kukushkin <cyberdemn(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Concurrency issue in pg_rewind |
Date: | 2020-09-17 13:05:28 |
Message-ID: | 30ec75b9bd9bfab1e83e7168dc6d6ddc@postgrespro.ru |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2020-09-17 15:27, Alexander Kukushkin wrote:
> On Thu, 17 Sep 2020 at 14:04, Alexey Kondratov
> <a(dot)kondratov(at)postgrespro(dot)ru> wrote:
>
>> With --restore-target-wal pg_rewind is trying to call restore_command
>> on
>> its own and it can happen at two stages:
>>
>> 1) When pg_rewind is trying to find the last checkpoint preceding a
>> divergence point. In that case file map is not even yet initialized.
>> Thus, all fetched WAL segments at this stage will be present in the
>> file
>> map created later.
>
> Nope, it will fetch files you requested, and in addition to that it
> will leave a child process running in the background which is doing
> the prefetch (manipulating with pg_wal/.wal-g/...)
>
>>
>> 2) When it creates a data pages map. It should traverse WAL from the
>> last common checkpoint till the final shutdown point in order to find
>> all modified pages on the target. At this stage pg_rewind only updates
>> info about data segments in the file map. That way, I see a minor
>> problem that WAL segments fetched at this stage would not be deleted,
>> since they are absent in the file map.
>>
>> Anyway, pg_rewind does not delete neither WAL segments, not any other
>> files in the middle of the file map creation, so I cannot imagine, how
>> it can get into the same trouble on its own.
>
> When pg_rewind was creating the map, some temporary files where there,
> because the forked child process of wal-g was still running.
> When the wal-g child process exits, it removes some of these files.
> Specifically, it was trying to prefetch 0000008400000A7600000024 into
> the pg_wal/.wal-g/prefetch/running/0000008400000A7600000024, but
> apparently the file wasn't available on S3 and prefetch failed,
> therefore the empty file was removed.
>
I do understand how you got into this problem with wal-g. This part of
my answer was about bare postgres and pg_rewind. And my point was that
from my perspective pg_rewind with --restore-target-wal cannot get into
the same trouble on its own, without 'help' of some side tools like
wal-g.
Regards
--
Alexey Kondratov
Postgres Professional https://www.postgrespro.com
Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | k.jamison@fujitsu.com | 2020-09-17 13:06:33 | RE: [Patch] Optimize dropping of relation buffers using dlist |
Previous Message | Amit Kapila | 2020-09-17 12:51:59 | Re: Fix for parallel BTree initialization bug |