Re: optimizing pg_upgrade's once-in-each-database steps

From: Ilya Gladyshev <ilya(dot)v(dot)gladyshev(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: optimizing pg_upgrade's once-in-each-database steps
Date: 2024-07-31 21:55:33
Message-ID: 10c1c8dd-4685-46d4-80be-56bdfca8659a@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 22.07.2024 21:07, Nathan Bossart wrote:
> On Fri, Jul 19, 2024 at 04:21:37PM -0500, Nathan Bossart wrote:
>> However, while looking into this, I noticed that only one get_query
>> callback (get_db_subscription_count()) actually customizes the generated
>> query using information in the provided DbInfo. AFAICT we can do this
>> particular step without running a query in each database, as I mentioned
>> elsewhere [0]. That should speed things up a bit and allow us to simplify
>> the AsyncTask code.
>>
>> With that, if we are willing to assume that a given get_query callback will
>> generate the same string for all databases (and I think we should), we can
>> run the callback once and save the string in the step for dispatch_query()
>> to use. This would look more like what you suggested in the quoted text.
> Here is a new patch set. I've included the latest revision of the patch to
> fix get_db_subscription_count() from the other thread [0] as 0001 since I
> expect that to be committed soon. I've also moved the patch that moves the
> "live_check" variable to "user_opts" to 0002 since I plan on committing
> that sooner than later, too. Otherwise, I've tried to address all feedback
> provided thus far.
>
> [0] https://commitfest.postgresql.org/49/5135/
>
Hi,

I like your idea of parallelizing these checks with async libpq API,
thanks for working on it. The patch doesn't apply cleanly on master
anymore, but I've rebased locally and taken it for a quick spin with a
pg16 instance of 1000 empty databases. Didn't see any regressions with
-j 1, there's some speedup with -j 8 (33 sec vs 8 sec for these checks).

One thing that I noticed that could be improved is we could start a new
connection right away after having run all query callbacks for the
current connection in process_slot, instead of just returning and
establishing the new connection only on the next iteration of the loop
in async_task_run after potentially sleeping on select.

+1 to Jeff's suggestion that perhaps we could reuse connections, but
perhaps that's a separate story.

Regards,

Ilya

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-07-31 21:58:44 Re: Add mention of execution time memory for enable_partitionwise_* GUCs
Previous Message Sutou Kouhei 2024-07-31 21:49:49 Re: Fixing backslash dot for COPY FROM...CSV