Re: Stronger safeguard for archive recovery not to miss data

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: masao(dot)fujii(at)oss(dot)nttdata(dot)com
Cc: osumi(dot)takamichi(at)fujitsu(dot)com, david(at)pgmasters(dot)net, pgsql-hackers(at)lists(dot)postgresql(dot)org, laurenz(dot)albe(at)cybertec(dot)at
Subject: Re: Stronger safeguard for archive recovery not to miss data
Date: 2021-04-05 07:13:27
Message-ID: 20210405.161327.326176026982929551.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Mon, 5 Apr 2021 12:34:53 +0900, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> wrote in
>
>
> On 2021/04/04 11:58, osumi(dot)takamichi(at)fujitsu(dot)com wrote:
> >> IMO it's better to comment why this server restart is necessary.
> >> As far as I understand correctly, this is necessary to ensure the WAL
> >> file
> >> containing the record about the change of wal_level (to minimal) is
> >> archived,
> >> so that the subsequent archive recovery will be able to replay it.
> > OK, added some comments. Further, I felt the way I wrote this part was
> > not good at all and self-evident
> > and developers who read this test would feel uneasy about that point.
> > So, a little bit fixed that test so that we can get clearer conviction
> > for wal archive.
>
> LGTM. Thanks for updating the patch!
>
> Attached is the updated version of the patch. I applied the following
> changes.

+ errhint("Use a backup taken after setting wal_level to higher than minimal "
+ "or recover to the point in time before wal_level was changed to minimal even though it may cause data loss.")));

Looking the HINT message, I thought that it's hard to find where up to
I should recover. The caller knows the LSN of last PARAMETER_CHANGE
record so we can show that as the recovery-end LSN. On the other
hand, I'm not sure it's worth considering tough, we can reach here
when starting archive recovery on a server with wal_level=minimal. We
can pass 0 as LSN to notify that case.

If we do that, we can emit more clear message like the following.

WAL-while-minimal case
FATAL: WAL generated with wal_level=minimal at 0/40000A0, data may be missing
HINT: Use a backup later than, or recover up to before that LSN.

Mis-setting case
FATAL: archive recovery is not available on server with wal_level=minimal
HINT: Start this server with setting wal_level=replica or higher

The value "replica" is not double-quoted there (since before this
patch), but double-quoted in another error message about hot standby
just below. Maybe we should let them in the same style.

> Could you review this version? Barring any objection, I'm thinking to
> commit this.
>
> +sub check_wal_level
> +{
> + my ($target_wal_level, $explanation) = @_;
> +
> + is( $node->safe_psql(
> + 'postgres', q{show wal_level}),
> + $target_wal_level,
> + $explanation);
>
> Do we really need this test? This test doesn't seem to be directly
> related
> to the test purpose. It seems to be testing the behavior of the
> PostgresNode
> functions. So I removed this from the patch.

+1.

> +# This protection should apply to recovery mode
> +my $another_node = get_new_node('another_node');
>
> The same test is performed twice with different settings. So isn't it
> better
> to merge the code for both tests into one common function, to simplify
> the code? I did that.

Sounds reasonable for that size.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2021-04-05 07:14:06 Re: Replication slot stats misgivings
Previous Message Amit Kapila 2021-04-05 07:06:07 Re: Any objections to implementing LogicalDecodeMessageCB for pgoutput?