Re: Basebackup fails without useful error message

From: Koen De Groote <kdg(dot)dev(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>, PostgreSQL General <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: Basebackup fails without useful error message
Date: 2024-10-22 19:50:24
Message-ID: CAGbX52EO3zCuNEbbLOkKjXOxbVUnQf-P=vkGLFOuxDYg1nFiuw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello David,

I saw the backup fail. The backup logged that it terminated the walsender,
and correlating the moment it failed to the metrics of my storage, shows
the storage at that time was facing a huge IOWAIT. And this was a network
mounted storage.

The backup process continued, but because of a failure to stream WAL
without error(due to a local issue) the entire backup was marked as failed.
At the end, pg_basebackup will delete the backup, in this case. There's no
flag to control this final behavior.

I'll be testing restore soon without streaming WAL, since the actual
restore I perform doesn't use the pg_wal.tar.gz file. It gets the archived
WAL At least I think it doesn't need it, hence the need for testing.

Regards,
Koen De Groote

On Tue, Oct 22, 2024 at 12:34 AM David G. Johnston <
david(dot)g(dot)johnston(at)gmail(dot)com> wrote:

> On Sunday, October 20, 2024, Koen De Groote <kdg(dot)dev(at)gmail(dot)com> wrote:
>>
>>
>> I'm going to be testing this. If someone could confirm that this is how
>> writing WAL files works, that being: that it is only considered "done" when
>> the archive_command is done, that would be great.
>>
>
> The archiving of WAL files by the primary does not involve a replication
> connection of any sort and thus the “WAL sender” settings are not relevant
> to it; or, here, whether or not you are archiving your WAL is immaterial
> since you are streaming it as it gets produced.
>
> If you are streaming WAL it seems highly unusual that you’d end up in a
> situation where the connection goes idle long enough that it gets killed,
> especially if the backup is still happening. I’d probably go with
> performing the backup under a disabled (or extremely large?) timeout though
> and move on to other things.
>
> That isn’t to say I fully understand what actually is happening here…
>
> David J.
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ron Johnson 2024-10-22 20:00:50 Re: Query performance issue
Previous Message Tom Lane 2024-10-22 19:33:40 Re: Using Expanded Objects other than Arrays from plpgsql