Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: use CREATE DATABASE STRATEGY = FILE_COPY in pg_upgrade
Date: 2024-06-11 08:39:51
Message-ID: CAEze2WhfkYQbyRaBptT2BQ5Sv-DN0gEWCuSMWbQzCwyb8qjkBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 11 Jun 2024 at 04:01, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>
> On Fri, Jun 07, 2024 at 11:10:25AM +0200, Matthias van de Meent wrote:
> > My primary concern isn't the IO, but the O(shared_buffers) that we
> > have to go through during a checkpoint. As I mentioned upthread, it is
> > reasonably possible the new cluster is already setup with a good
> > fraction of the old system's shared_buffers configured. Every
> > checkpoint has to scan all those buffers, which IMV can get (much)
> > more expensive than the IO overhead caused by the WAL_LOG strategy. It
> > may be a baseless fear as I haven't done the performance benchmarks
> > for this, but I wouldn't be surprised if shared_buffers=8GB would
> > measurably impact the upgrade performance in the current patch (vs the
> > default 128MB).
>
> I did a handful of benchmarks on an r5.24xlarge that seem to prove your
> point. The following are the durations of the pg_restore step of
> pg_upgrade:
>
> * 10k empty databases, 128MB shared_buffers
> WAL_LOG: 1m 01s
> FILE_COPY: 0m 22s
>
> * 10k empty databases, 100GB shared_buffers
> WAL_LOG: 2m 03s
> FILE_COPY: 5m 08s
>
> * 2.5k databases with 10k tables each, 128MB shared_buffers
> WAL_LOG: 17m 20s
> FILE_COPY: 16m 44s
>
> * 2.5k databases with 10k tables each, 100GB shared_buffers
> WAL_LOG: 16m 39s
> FILE_COPY: 15m 21s
>
> I was surprised with the last result, but there's enough other stuff
> happening during such a test that I hesitate to conclude much.

If you still have the test data set up, could you test the attached
patch (which does skip the checkpoints in FILE_COPY mode during binary
upgrades)?

>>>> If such a change were implemented (i.e. no checkpoints for FILE_COPY
>>>> in binary upgrade, with a single manual checkpoint after restoring
>>>> template1 in create_new_objects) I think most of my concerns with this
>>>> patch would be alleviated.
>>>
>>> Yeah, I think that's a valid point. The second checkpoint is to ensure
>>> that the XLOG_DBASE_CREATE_FILE_COPY never gets replayed. However, for
>>> binary upgrades, we don't need that guarantee because a checkpoint
>>> will be performed during shutdown at the end of the upgrade anyway.
>>
>> Indeed.
>
> It looks like pg_dump always uses template0, so AFAICT we don't even need
> the suggested manual checkpoint after restoring template1.

Thanks for reminding me. It seems I misunderstood the reason why we
first process template1 in create_new_objects, as I didn't read the
comments thoroughly enough.

> I do like the idea of skipping a bunch of unnecessary operations in binary
> upgrade mode, since it'll help me in my goal of speeding up pg_upgrade.
> But I'm a bit hesitant to get too fancy here and to introduce a bunch of
> new "if (IsBinaryUpgrade)" checks if the gains in the field won't be all
> that exciting. However, we've already sprinkled such checks quite
> liberally, so maybe I'm being too cautious...

Hmm, yes. From an IO perspective I think this could be an improvement,
but let's check the numbers first.

Kind regards,

Matthias van de Meent

Attachment Content-Type Size
0001-pg_upgrade-use-CREATE-DATABASE-.-STRATEGY-FILE_COPY.patch application/octet-stream 3.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-06-11 08:41:06 Re: Ambiguous description on new columns
Previous Message Amit Kapila 2024-06-11 08:39:15 Re: confirmed flush lsn seems to be move backward in certain error cases