RE: Stronger safeguard for archive recovery not to miss data

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "masao(dot)fujii(at)oss(dot)nttdata(dot)com" <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, "david(at)pgmasters(dot)net" <david(at)pgmasters(dot)net>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "laurenz(dot)albe(at)cybertec(dot)at" <laurenz(dot)albe(at)cybertec(dot)at>
Subject: RE: Stronger safeguard for archive recovery not to miss data
Date: 2021-04-06 06:59:08
Message-ID: OSBPR01MB488847BC24220F046DB6BEB4ED769@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, April 6, 2021 3:24 PM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> At Tue, 6 Apr 2021 04:11:35 +0000, "osumi(dot)takamichi(at)fujitsu(dot)com"
> <osumi(dot)takamichi(at)fujitsu(dot)com> wrote in
> > On Tuesday, April 6, 2021 9:41 AM Fujii Masao
> > <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
> > > On 2021/04/05 23:54, osumi(dot)takamichi(at)fujitsu(dot)com wrote:
> > > >> This makes me think that we should document this risk.... Thought?
> > > > +1. We should notify the risk when user changes
> > > > the wal_level higher than minimal to minimal to invoke a
> > > > carefulness of user for such kind of operation.
> > >
> > > I removed the HINT message "or recover to the point in ..." and
> > > added the following note into the docs.
> > >
> > > Note that changing <varname>wal_level</varname> to
> > > <literal>minimal</literal> makes any base backups taken before
> > > unavailable for archive recovery and standby server, which may
> > > lead to database loss.
> > Thank you for updating the patch. Let's make the sentence more strict.
> >
> > My suggestion for this explanation is
> > "In order to prevent database corruption, changing wal_level to
> > minimal from higher level in the middle of WAL archiving requires
> > careful attention. It makes any base backups taken before the
> > operation unavailable for archive recovery and standby server. Also,
> > it may lead to whole database loss when archive recovery fails with an
> > error for that change.
> > Take a new base backup immediately after making wal_level back to higher
> level."
>
> The first sentense looks like somewhat nanny-ish. The database is not
> corrupt at the time of this error.
Yes. Excuse me for misleading sentence.
I just wanted to write why the error was introduced,
but it was not necessary.
We should remove and fix the first part of the sentence.

> We just lose updates after the last read
> segment at this point. As Fujii-san said, we can continue recoverying using
> crash recovery and we will reach having a corrupt database after that.
OK. Thank you for explanation.

> About the last sentence, I prefer more flat wording, such as "You need to take
> a new base backup..."
Either is fine to me.

> > Then, we can be consistent with our new hint message, "Use a backup
> > taken after setting wal_level to higher than minimal.".
>
> > Is it better to add something similar to "Take an offline backup when
> > you stop the server and change the wal_level" around the end of this part as
> another option for safeguard, also?
>
> Backup policy is completely a matter of DBAs.
OK. No problem. No need to add it.

> If flipping wal_level alone
> highly causes unstartable corruption,,, I think it is a bug.
> > For the performance technique part, what we need to explain is same.
>
> Might be good, but in simpler wording.
Yeah, I agree.

> > Another minor thing I felt we need to do might be to add double quotes to
> wrap minimal in errhint.
>
> Since the error about hot_standby has gone, either will do for me.
Thanks for sharing your thoughts.

Best Regards,
Takamichi Osumi

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2021-04-06 07:00:38 Re: postgres_fdw: IMPORT FOREIGN SCHEMA ... LIMIT TO (partition)
Previous Message Amit Kapila 2021-04-06 06:49:20 Re: Replication slot stats misgivings