Re: Incremental backup from a streaming replication standby fails

From: David Steele <david(at)pgmasters(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Incremental backup from a streaming replication standby fails
Date: 2024-07-19 15:32:06
Message-ID: a4391db7-d308-4814-ba6b-7c4e5ed59dc6@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/19/24 21:52, Robert Haas wrote:
> On Mon, Jul 15, 2024 at 11:27 AM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
>> On Sat, 2024-06-29 at 07:01 +0200, Laurenz Albe wrote:
>>> I played around with incremental backup yesterday and tried $subject
>>>
>>> The WAL summarizer is running on the standby server, but when I try
>>> to take an incremental backup, I get an error that I understand to mean
>>> that WAL summarizing hasn't caught up yet.
>>>
>>> I am not sure if that is working as designed, but if it is, I think it
>>> should be documented.
>>
>> I played with this some more. Here is the exact error message:
>>
>> ERROR: manifest requires WAL from final timeline 1 ending at 0/1967C260, but this backup starts at 0/1967C190
>>
>> By trial and error I found that when I run a CHECKPOINT on the primary,
>> taking an incremental backup on the standby works.
>>
>> I couldn't fathom the cause of that, but I think that that should either
>> be addressed or documented before v17 comes out.
>
> I had a feeling this was going to be confusing. I'm not sure what to
> do about it, but I'm open to suggestions.
>
> Suppose you take a full backup F; replay of that backup will begin
> with a checkpoint CF. Then you try to take an incremental backup I;
> replay will begin from a checkpoint CI. For the incremental backup to
> be valid, it must include all blocks modified after CF and before CI.
> But when the backup is taken on a standby, no new checkpoint is
> possible. Hence, CI will be the most recent restartpoint on the
> standby that has occurred before the backup starts. So, if F is taken
> on the primary and then I is immediately taken on the standby without
> the standby having done a new restartpoint, or if both F and I are
> taken on the standby and no restartpoint intervenes, then CF=CI. In
> that scenario, an incremental backup is pretty much pointless: every
> single incremental file would contain 0 blocks. You might as well just
> use the backup you already have, unless one of the non-relation files
> has changed. So, except in that unusual corner case, the fact that the
> backup fails isn't really costing you anything. In fact, there's a
> decent chance that it's saving you from taking a completely useless
> backup.

<snip>

> I think I'm a little too close to this to really know what the best
> thing to do is, so I'm happy to hear suggestions from you and others.

I think it would be enough just to add a hint such as:

HINT: this is possible when making a standby backup with little or no
activity.

My guess is in production environments this will be uncommon.

For example, over the years we (pgBackRest) have gotten numerous bug
reports that time-targeted PITR does not work. In every case we found
that the user was just testing procedures and the database had no
activity between backups -- therefore recovery had no commit timestamps
to use to end recovery. Test environments sometimes produce weird results.

Having said that, I think it would be better if it worked even if it
does produce an empty backup. An empty backup wastes some disk space but
if it produces less friction and saves an admin having to intervene then
it is probably worth it. I don't immediately see how to do that in a
reliable way, though, and in any case it seems like something to
consider for PG18.

Regards,
-David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2024-07-19 15:50:41 Re: Built-in CTYPE provider
Previous Message Christoph Berg 2024-07-19 15:21:05 Re: Build with LTO / -flto on macOS