From: | "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com> |
---|---|
To: | 'Laurenz Albe' <laurenz(dot)albe(at)cybertec(dot)at>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Stronger safeguard for archive recovery not to miss data |
Date: | 2021-01-21 11:49:07 |
Message-ID: | OSBPR01MB488879E8A1807F595AD82B27EDA10@OSBPR01MB4888.jpnprd01.prod.outlook.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi
On Wed, Jan 20, 2021 9:03 PM Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> wrote:
>
> On Wed, 2021-01-20 at 13:10 +0900, Fujii Masao wrote:
> > + errhint("Run recovery again from a
> > + new base backup taken after setting wal_level higher than
> > + minimal")));
> >
> > Isn't it impossible to do this in normal archive recovery case? In
> > that case, since the server crashed and the database got corrupted,
> > probably we cannot take a new base backup.
> >
> > Originally even when users accidentally set wal_level to minimal, they
> > could start the server from the backup by disabling hot_standby and salvage
> the data.
> > But with the patch, we lost the way to do that. Right? I was wondering
> > that WARNING was used intentionally there for that case.
OK. I understand that this WARNING is necessary to give user a chance
to start up the server again and salvage his/her data,
when hot_standby=off and the server goes through a period of wal_level=minimal
during an archive recovery.
> I would argue that if you set your "wal_level" to minimal, do some work, reset it
> to "replica" and recover past that, two things could happen:
>
> 1. Since "archive_mode" was off, you are missing some WAL segments and
> cannot recover past that point anyway.
>
> 2. You are missing some relevant WAL records, and your recovered
> database is corrupted. This is very likely, because you probably
> switched to "minimal" with the intention to generate less WAL.
>
> Now I see your point that a corrupted database may be better than no database
> at all, but wouldn't you agree that a warning in the log (that nobody reads) is
> too little information?
>
> Normally we don't take such a relaxed attitude towards data corruption.
Yeah, we thought we needed stronger protection for that reason.
> Perhaps there could be a GUC "recovery_allow_data_corruption" to override this
> check, but I'd say that a warning is too little.
>
> > if (ControlFile->wal_level < WAL_LEVEL_REPLICA)
> > ereport(ERROR,
> > (errmsg("hot standby is not
> possible because wal_level was not set to \"replica\" or higher on the primary
> server"),
> > errhint("Either set wal_level
> > to \"replica\" on the primary, or turn off hot_standby here.")));
> >
> > With the patch, we never reach the above code?
>
> Right, that would have to go. I didn't notice that.
Adding a condition to check if "recovery_allow_data_corruption" is 'on' around the end of
CheckRequiredParameterValues() sounds safer for me too, although
implementing a new GUC parameter sounds bigger than what I expected at first.
The default of the value should be 'off' to protect users from getting the corrupted server.
Does everyone agree with this direction ?
Best Regards,
Takamichi Osumi
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2021-01-21 11:53:24 | Re: Is Recovery actually paused? |
Previous Message | Dean Rasheed | 2021-01-21 11:11:55 | Re: PoC/WIP: Extended statistics on expressions |