Re: standby recovery fails (tablespace related) (tentative patch and discussion)

From: Paul Guo <pguo(at)pivotal(dot)io>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Asim Praveen <apraveen(at)pivotal(dot)io>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date: 2019-04-23 05:31:58
Message-ID: CAEET0ZEcwz57z2yfWRds43b3TfQPPDSWmbjGmD43xRxLT41NDg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Kyotaro, ignoring the MakePGDirectory() failure will fix this database
create redo error, but I suspect some other kind of redo, which depends on
the files under the directory (they are not copied since the directory is
not created) and also cannot be covered by the invalid page mechanism,
could fail. Thanks.

On Mon, Apr 22, 2019 at 3:40 PM Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> Oops! The comment in the previous patch is wrong.
>
> At Mon, 22 Apr 2019 16:15:13 +0900 (Tokyo Standard Time), Kyotaro
> HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <
> 20190422(dot)161513(dot)258021727(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > At Mon, 22 Apr 2019 12:36:43 +0800, Paul Guo <pguo(at)pivotal(dot)io> wrote in
> <CAEET0ZGpUrMGUzfyzVF9FuSq+zb=QovYa2cvyRnDOTvZ5vXxTw(at)mail(dot)gmail(dot)com>
> > > Please see my replies inline. Thanks.
> > >
> > > On Fri, Apr 19, 2019 at 12:38 PM Asim R P <apraveen(at)pivotal(dot)io> wrote:
> > >
> > > > On Wed, Apr 17, 2019 at 1:27 PM Paul Guo <pguo(at)pivotal(dot)io> wrote:
> > > > >
> > > > > create db with tablespace
> > > > > drop database
> > > > > drop tablespace.
> > > >
> > > > Essentially, that sequence of operations causes crash recovery to
> fail
> > > > if the "drop tablespace" transaction was committed before crashing.
> > > > This is a bug in crash recovery in general and should be reproducible
> > > > without configuring a standby. Is that right?
> > > >
> > >
> > > No. In general, checkpoint is done for
> drop_db/create_db/drop_tablespace on
> > > master.
> > > That makes the file/directory update-to-date if I understand the
> related
> > > code correctly.
> > > For standby, checkpoint redo does not ensure that.
> >
> > That's right partly. As you must have seen, fast shutdown forces
> > restartpoint for the last checkpoint and it prevents the problem
> > from happening. Anyway it seems to be a problem.
> >
> > > > Your patch creates missing directories in the destination. Don't we
> > > > need to create the tablespace symlink under pg_tblspc/? I would
> > > >
> > >
> > > 'create db with tablespace' redo log does not include the tablespace
> real
> > > directory information.
> > > Yes, we could add in it into the xlog, but that seems to be an
> overdesign.
> >
> > But I don't think creating directory that is to be removed just
> > after is a wanted solution. The directory most likely to be be
> > removed just after.
> >
> > > > prefer extending the invalid page mechanism to deal with this, as
> > > > suggested by Ashwin off-list. It will allow us to avoid creating
> > >
> > > directories and files only to remove them shortly afterwards when the
> > > > drop database and drop tablespace records are replayed.
> > > >
> > > >
> > > 'invalid page' mechanism seems to be more proper for missing pages of a
> > > file. For
> > > missing directories, we could, of course, hack to use that (e.g.
> reading
> > > any page of
> > > a relfile in that database) to make sure the tablespace create code
> > > (without symlink)
> > > safer (It assumes those directories will be deleted soon).
> > >
> > > More feedback about all of the previous discussed solutions is welcome.
> >
> > It doesn't seem to me that the invalid page mechanism is
> > applicable in straightforward way, because it doesn't consider
> > simple file copy.
> >
> > Drop failure is ignored any time. I suppose we can ignore the
> > error to continue recovering as far as recovery have not reached
> > consistency. The attached would work *at least* your case, but I
> > haven't checked this covers all places where need the same
> > treatment.
>
> The comment for the new function XLogMakePGDirectory is wrong:
>
> + * There is a possibility that WAL replay causes a creation of the same
> + * directory left by the previous crash. Issuing ERROR prevents the caller
> + * from continuing recovery.
>
> The correct one is:
>
> + * There is a possibility that WAL replay causes an error by creation of a
> + * directory under a directory removed before the previous crash. Issuing
> + * ERROR prevents the caller from continuing recovery.
>
> It is fixed in the attached.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2019-04-23 05:45:57 Re: pg_dump is broken for partition tablespaces
Previous Message Andrew Gierth 2019-04-23 05:23:13 Re: Symbol referencing errors