Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Paul Guo <pguo(at)pivotal(dot)io>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jimmy Yih <jyih(at)pivotal(dot)io>, Ashwin Agrawal <aagrawal(at)pivotal(dot)io>
Subject: Re: Two pg_rewind patches (auto generate recovery conf and ensure clean shutdown)
Date: 2019-10-04 14:21:25
Message-ID: ab5c15e0-773c-fbb4-f0a2-ca00a90035a6@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04.10.2019 11:37, Michael Paquier wrote:
> On Thu, Oct 03, 2019 at 12:43:37PM +0300, Alexey Kondratov wrote:
>> On 03.10.2019 6:07, Michael Paquier wrote:
>>> I have reworked your first patch as per the attached. What do you
>>> think about it? The part with the control file needs to go down to
>>> v12, and I would likely split that into two commits on HEAD: one for
>>> the control file and a second for the recovery.conf portion with the
>>> fix for --no-ensure-shutdown to keep a cleaner history.
>> It looks fine for me excepting the progress reporting part. It now adds
>> PG_CONTROL_FILE_SIZE to fetch_done. However, I cannot find that control file
>> is either included into filemap and fetch_size or counted during
>> calculate_totals(). Maybe I've missed something, but now it looks like we
>> report something that wasn't planned for progress reporting, doesn't
>> it?
> Right. The pre-12 code actually handles that incorrecly as it assumed
> that any files written through file_ops.c should be part of the
> progress. So I went with the simplest solution, and backpatched this
> part with 6f3823b. I have also committed the set of fixes for the new
> options so as we have a better base of work than what's on HEAD
> currently.

Great, thanks.

>
> Regarding the tests, adding a --dry-run command is a good idea.
> However I think that there is more value to automate the use of the
> single user mode automatically in the tests as that's more critical
> from the point of view of rewind run, and stopping the cluster with
> immediate mode causes, as expected, the next --dry-run command to
> fail.
>
> Another thing is that I think that we should use -F with --single.
> This makes recovery faster, and the target data folder is synced
> at the end of pg_rewind anyway.
>
> Using the long option names makes the tests easier to follow in this
> case, so I have switched -R to --write-recovery-conf.
>
> Some comments and the docs have been using some confusing wording, so
> I have reworked what I found (like many "it" in a single sentence
> referring different things).

I agree with all the points. Shutting down target server using
'immediate' mode is a good way to test ensureCleanShutdown automatically.

> Regarding all the set of incompatible options, we have much more of
> that after the initial option parsing so I think that we should group
> all the cheap ones together. Let's tackle that as a separate patch.
> We can also just check after --no-ensure-shutdown directly in
> RewindTest.pm as I have switched the cluster to not be cleanly shut
> down anymore to stress the automatic recovery path, and trigger that
> before running pg_rewind for the local and remote mode.
>
> Attached is an updated patch with all I found. What do you think?

I've checked your patch, but it seems that it cannot be applied as is,
since it e.g. adds a comment to 005_same_timeline.pl without actually
changing the test. So I've slightly modified your patch and tried to fit
both dry-run and ensureCleanShutdown testing together. It works just
fine and fails immediately if any of recent fixes is reverted. I still
think that dry-run testing is worth adding, since it helped to catch
this v12 refactoring issue, but feel free to throw it way if it isn't
commitable right now, of course.

As for incompatible options and sanity checks testing, yes, I agree that
it is a matter of different patch. I attached it as a separate WIP patch
just for history. Maybe I will try to gather more cases there later.

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
v3-0001-Improve-functionality-docs-and-tests-of-R-no-ensu.patch text/x-patch 9.5 KB
v3-0002-WIP-increase-pg_rewind-test-coverage.patch text/x-patch 2.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2019-10-04 14:27:06 Re: [HACKERS] Block level parallel vacuum
Previous Message Masahiko Sawada 2019-10-04 14:03:40 Re: [HACKERS] Block level parallel vacuum