Re: Race condition in recovery?

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: robertmhaas(at)gmail(dot)com
Cc: dilipbalaut(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Race condition in recovery?
Date: 2021-06-07 04:57:35
Message-ID: 20210607.135735.1630085931755816893.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At Fri, 4 Jun 2021 10:56:12 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in
> On Fri, Jun 4, 2021 at 5:25 AM Kyotaro Horiguchi
> <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > I think that's right. And the test script detects the issue for me
> > both on Linux but doesn't work for Windows.
> >
> > '"C:/../Documents/work/postgresql/src/test/recovery/t/cp_history_files"' is not recognized as an internal command or external command ..
>
> Hmm, that's a problem. Can you try the attached version?

Unfortunately no. The backslashes in the binary path need to be
escaped. (taken from PostgresNode.pm:1008)

> (my $perlbin = $^X) =~ s{\\}{\\\\}g if ($TestLib::windows_os);
> $node_primary->append_conf(
> 'postgresql.conf', qq(
> archive_command = '$perlbin "$FindBin::RealBin/cp_history_files" "%p" "$archivedir_primary/%f"'
> ));

This works for me.

> > + # clean up
> > + $node_primary->teardown_node;
> > + $node_standby->teardown_node;
> > + $node_cascade->teardown_node;
> >
> > I don't think explicit teardown is useless as the final cleanup.
>
> I don't know what you mean by this. If it's not useless, good, because
> we're doing it. Or do you mean that you think it is useless, and we
> should remove it?

Ugh! Sorry. I meant "The explicit teardowns are useless". That's not
harmful but it is done by PostgresNode.pm automatically(implicitly)
and we don't do that in the existing scripts.

> > By the way the attached patch is named as "Fix-corner-case..." but
> > doesn't contain the fix. Is it intentional?
>
> No, that was a goof.

As I said upthread the relationship between receiveTLI and
recoveryTargetTLI is not confirmed yet at the point.
findNewestTimeLine() simply searches for the history file with the
largest timeline id so the returned there's a case where the timeline
id that the function returns is not a future of the latest checkpoint
TLI. I think that the fact that rescanLatestTimeLine() checks the
relationship is telling us that we need to do the same in the path as
well.

In my previous proposal, it is done just after the line the patch
touches but it can be in the if (fetching_ckpt) branch.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2021-06-07 05:01:45 Re: Race condition in recovery?
Previous Message Etsuro Fujita 2021-06-07 03:57:25 Re: Asynchronous Append on postgres_fdw nodes.