From: | Daniel Gustafsson <daniel(at)yesql(dot)se> |
---|---|
To: | Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Magnus Hagander <magnus(at)hagander(dot)net>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Trap errors from streaming child in pg_basebackup to exit early |
Date: | 2022-02-18 21:00:43 |
Message-ID: | B7C15416-BD61-4926-9843-5C557BCD7007@yesql.se |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> On 16 Feb 2022, at 08:27, Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Wed, Sep 29, 2021 at 01:18:40PM +0200, Daniel Gustafsson wrote:
>> So there is one mention of a background WAL receiver already in there, but it's
>> pretty inconsistent as to what we call it. For now I've changed the messaging
>> in this patch to say "background process", leaving making this all consistent
>> for a follow-up patch.
>>
>> The attached fixes the above, as well as the typo mentioned off-list and is
>> rebased on top of todays HEAD.
>
> I have been looking a bit at this patch, and did some tests on Windows
> to find out that this is able to catch the failure of the thread
> streaming the WAL segments in pg_basebackup, avoiding a completion of
> the base backup, while HEAD waits until the backup finishes. Testing
> this scenario is actually simple by issuing pg_terminate_backend() on
> the WAL sender that streams the WAL with START_REPLICATION, while
> throttling the base backup.
Great, thanks!
> Could you add a test to automate this scenario? As far as I can see,
> something like the following should be stable even for Windows:
> 1) Run a pg_basebackup in the background with IPC::Run, using
> --max-rate with a minimal value to slow down the base backup, for slow
> machines. 013_crash_restart.pl does that as one example with $killme.
> 2) Find out the WAL sender doing START_REPLICATION in the backend, and
> issue pg_terminate_backend() on it.
> 3) Use a variant of pump_until() on the pg_basebackup process and
> check after one or more failure patterns. We should refactor this
> part, actually. If this new test uses the same logic, that would make
> three tests doing that with 022_crash_temp_files.pl and
> 013_crash_restart.pl. The CI should be fine to provide any feedback
> with the test in place, though I am fine to test things also in my
> box.
This is good idea, I was going in a different direction earlier with a test but
this is cleaner. The attached 0001 refactors pump_until; 0002 fixes a trivial
spelling error found while hacking; and 0003 is the previous patch complete
with a test that passes on Cirrus CI.
--
Daniel Gustafsson https://vmware.com/
Attachment | Content-Type | Size |
---|---|---|
v5-0001-Add-function-to-pump-IPC-process-until-string-mat.patch | application/octet-stream | 9.0 KB |
v5-0002-Remove-duplicated-word-in-comment.patch | application/octet-stream | 921 bytes |
v5-0003-Quick-exit-on-log-stream-child-exit-in-pg_basebac.patch | application/octet-stream | 5.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2022-02-18 21:02:23 | Re: Small TAP tests cleanup for Windows and unused modules |
Previous Message | Nathan Bossart | 2022-02-18 20:54:29 | Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file |