Re: standby recovery fails (tablespace related) (tentative patch and discussion)

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: robertmhaas(at)gmail(dot)com
Cc: alvherre(at)alvh(dot)no-ip(dot)org, michael(at)paquier(dot)xyz, rjuju123(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Date: 2022-04-04 08:29:48
Message-ID: 20220404.172948.678193664696814690.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 1 Apr 2022 14:51:58 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in
> On Fri, Apr 1, 2022 at 12:22 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > By the way, may I ask how do we fix this? The existing recovery code
> > already generates just-to-be-delete files in a real directory in
> > pg_tblspc sometimes, and elsewise skip applying WAL records on
> > nonexistent heap pages. It is the "mixed" way.
>
> Can you be more specific about where we have each behavior now?

They're done in XLogReadBufferExtended.

The second behavior happens here,
xlogutils.c:
> /* hm, page doesn't exist in file */
> if (mode == RBM_NORMAL)
> {
> log_invalid_page(rnode, forknum, blkno, false);
+ Assert(0);
> return InvalidBuffer;

With the assertion, 015_promotion_pages.pl crashes. This prevents page
creation and the following redo action on the page.

The first behavior is described as the following comment:

> * Create the target file if it doesn't already exist. This lets us cope
> * if the replay sequence contains writes to a relation that is later
> * deleted. (The original coding of this routine would instead suppress
> * the writes, but that seems like it risks losing valuable data if the
> * filesystem loses an inode during a crash. Better to write the data
> * until we are actually told to delete the file.)
> */
> smgrcreate(smgr, forknum, true);

Without the smgrcreate call, make check-world fails due to missing
files for FSM and visibility map, and init forks, which it's a bit
doubtful that the cases fall into the category so-called "creates
inexistent objects by redo access". In a few places, XLOG_FPI records
are used to create the first page of a file including main and init
forks. But I don't see a case of main fork during make check-world.

# Most of the failure cases happen as standby freeze. I was a bit
# annoyed that make check-world doesn't tell what is the module
# currently being tested. In that case I had to deduce it from the
# sequence of preceding script names, but if the first TAP script of a
# module freezes, I had to use ps to find the module..

> > 1. stop XLogReadBufferForRedo creating a file in nonexistent
> > directories then remember the failure (I'm not sure how big the
> > impact is.)
> >
> > 2. unconditionally create all objects required for recovery to proceed..
> > 2.1 and igore the failures.
> > 2.2 and remember the failures.
> >
> > 3. Any other?
> >
> > 2 needs to create a real directory in pg_tblspc. So 1?
>
> I think we could either do 1 or 2. My intuition is that getting 2
> working would be less scary and more likely to be something we would
> feel comfortable back-patching, but 1 is probably a better design in
> the long term. However, I might be wrong -- that's just a guess.

Thanks. I forgot to mention in the previous mail (but mentioned
somewhere upthread) but if we take 2, there's no way other than
creating a real directory in pg_tblspc while recovery. I don't think
it is neat.

I haven't found how the patch caused creation of a relation file that
is to be removed soon. However, I find that v19 patch fails by maybe
due to some change in Cluster.pm. It takes a bit more time to check
that..

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Shelepanov 2022-04-04 08:51:10 collect_corrupt_items_vacuum.patch
Previous Message Andrey V. Lepikhov 2022-04-04 08:27:45 Re: Removing unneeded self joins