Re: Recovery of .partial WAL segments

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Stefan Fercot <stefan(dot)fercot(at)protonmail(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Recovery of .partial WAL segments
Date: 2024-10-18 13:56:53
Message-ID: CAEze2Wi30VZDKxaR0wRGNx7LJ=heiwPpALhLFTu2Lg86QanCXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 5 Apr 2024 at 11:45, Stefan Fercot <stefan(dot)fercot(at)protonmail(dot)com> wrote:
>
> Dear hackers,
>
> Generating a ".partial" WAL segment is pretty common nowadays (using pg_receivewal or during standby promotion).
> However, we currently don't do anything with it unless the user manually removes that ".partial" extension.
>
> The 028_pitr_timelines tests are highlighting that fact: with test data being being in 000000020000000000000003 and 000000010000000000000003.partial, a recovery following the latest timeline (2) will succeed but fail if we follow the current timeline (1).
>
> By simply trying to fetch the ".partial" file in XLogFileRead, we can easily recover more data and also cover that (current timeline) recovery case.
>
> So, this proposed patch makes XLogFileRead try to restore ".partial" WAL archives and adds a test to 028_pitr_timelines using current recovery_target_timeline.

Does this path only get hit when we don't already have any WAL
segments (or partial segments) left for that timeline? I'm a bit
worried about overwriting existing (partial) segments that may have
more WAL than what we can get from archives.

(patch v2)
> + restoredArchivedFile = !RestoreArchivedFile(path, xlogfname,
> + "RECOVERYXLOG",
> + wal_segment_size,
> + InRedo) &&
> + !RestoreArchivedFile(path, partialxlogfname,
> "RECOVERYXLOG",
> wal_segment_size,
> - InRedo))
> + InRedo);

The value of restoredArchiveFile is inverted with what it indicates:
It is true when we failed to restore an archived xlog segment, and
false if we did succeed.

I'm also not a fan of the additional allocation of partialxlogfname in
this code. It could well do without, by "just" reusing the xlogfname
scratch space when we fail to recover the full segment.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Aleksander Alekseev 2024-10-18 14:20:42 Re: [PATCH] Add get_bytes() and set_bytes() functions
Previous Message jian he 2024-10-18 13:52:43 Re: New "raw" COPY format