Quick Links

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

From:	Asim R P <apraveen(at)pivotal(dot)io>
To:	Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc:	Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>, Paul Guo <pguo(at)pivotal(dot)io>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date:	2019-09-12 12:02:41
Message-ID:	CANXE4TcG68ksC5DCNatr1MZxgvuMpJi_ZfKN2f1ZG2BLyG2H_Q@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Sep 12, 2019 at 2:05 PM Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
wrote:
>
> Hello.
>
> At Wed, 11 Sep 2019 17:26:44 +0300, Anastasia Lubennikova <
a(dot)lubennikova(at)postgrespro(dot)ru> wrote in <
a82a896b-93f0-c26c-b941-f5665131381b(at)postgrespro(dot)ru>
> > 10.09.2019 14:42, Asim R P wrote:
> > > Hi Anastasia
> > >
> > > On Thu, Aug 22, 2019 at 9:43 PM Anastasia Lubennikova
> > > <a(dot)lubennikova(at)postgrespro(dot)ru <mailto:a(dot)lubennikova(at)postgrespro(dot)ru>>
> > > wrote:
> > > >
> > > > But during the review, I found a bug in the current implementation.
> > > > New behavior must apply to crash-recovery only, now it applies to
> > > > archiveRecovery too.
> > > > That can cause a silent loss of a tablespace during regular standby
> > > > operation
> > > > since it never calls CheckRecoveryConsistency().
>
> Yeah. We should take the same steps with redo operations on
> missing pages. Just record failure during inconsistent state then
> forget it if underlying tablespace is gone. If we had a record
> when we reached concsistency, we're in a serious situation and
> should stop recovery. log_invalid_page forget_invalid_pages and
> CheckRecoveryConsistency are the entry points of the feature to
> understand.
>

Yes, I get it now. I will adjust the patch written by Paul accordingly.

>
> # I remember that the start point of this patch is a crash after
> # table space drop subsequent to several operations within the
> # table space. Then, crash recovery fails at an operation in the
> # finally-removed tablespace. Is it right?
>

That's correct. Once the directories are removed from filesystem, any
attempt to replay WAL records that depend on their existence fails.

> > > But before that I'm revisiting another solution upthread, that of
> > > creating restart points when replaying create/drop database commands
> > > before making filesystem changes such as removing a directory. The
> > > restart points should align with checkpoints on master. The concern
> > > against this solution was creation of restart points will slow down
> > > recovery. I don't think crash recovery is affected by this solution
> > > because of the already existing enforcement of checkpoints. WAL
> > > records prior to a create/drop database will not be seen by crash
> > > recovery due to the checkpoint enforced during the command's normal
> > > execution.
> > >
> >
> > I haven't measured the impact of generating extra restart points in
> > previous solution, so I cannot tell whether concerns upthread are
> > justified. Still, I enjoy latest design more, since it is clear and
> > similar with the code of checking unexpected uninitialized pages. In
> > principle it works. And the issue, I described in previous review can
> > be easily fixed by several additional checks of InHotStandby macro.
>
> Generally we shouldn't trigger useless restart point for
> uncertain reasons. If standby crashes, it starts the next
> recovery from the latest *restart point*. Even in that case what
> we should do is the same.
>

The reason is quite clear to me - removing directories from filesystem
break the ability to replay WAL records second time. And we already create
checkpoints during normal operation in such a case, so crash recovery on a
master node does not suffer from this bug. I've attached a patch that
performs restart points during drop database replay, just for reference.
It passes both the TAP tests written by Kyotaro and Paul. I had to modify
drop database WAL record a bit.

Asim

Attachment	Content-Type	Size
v1-0001-Create-restartpoint-when-replaying-drop-database.patch	application/octet-stream	6.2 KB
v1-0002-Test-to-validate-replay-of-WAL-records-involving-.patch	application/octet-stream	7.2 KB

In response to

Re: standby recovery fails (tablespace related) (tentative patch and discussion) at 2019-09-12 08:35:36 from Kyotaro Horiguchi

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Euler Taveira	2019-09-12 12:39:00	Re: doc: pg_trgm missing description for GUC "pg_trgm.strict_word_similarity_threshold"
Previous Message	Daniel Verite	2019-09-12 11:55:37	Re: Create collation reporting the ICU locale display name