Quick Links

Re: optimizing pg_upgrade's once-in-each-database steps

From:	Ilya Gladyshev <ilya(dot)v(dot)gladyshev(at)gmail(dot)com>
To:	Nathan Bossart <nathandbossart(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: optimizing pg_upgrade's once-in-each-database steps
Date:	2024-07-31 21:55:33
Message-ID:	10c1c8dd-4685-46d4-80be-56bdfca8659a@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 22.07.2024 21:07, Nathan Bossart wrote:
> On Fri, Jul 19, 2024 at 04:21:37PM -0500, Nathan Bossart wrote:
>> However, while looking into this, I noticed that only one get_query
>> callback (get_db_subscription_count()) actually customizes the generated
>> query using information in the provided DbInfo. AFAICT we can do this
>> particular step without running a query in each database, as I mentioned
>> elsewhere [0]. That should speed things up a bit and allow us to simplify
>> the AsyncTask code.
>>
>> With that, if we are willing to assume that a given get_query callback will
>> generate the same string for all databases (and I think we should), we can
>> run the callback once and save the string in the step for dispatch_query()
>> to use. This would look more like what you suggested in the quoted text.
> Here is a new patch set. I've included the latest revision of the patch to
> fix get_db_subscription_count() from the other thread [0] as 0001 since I
> expect that to be committed soon. I've also moved the patch that moves the
> "live_check" variable to "user_opts" to 0002 since I plan on committing
> that sooner than later, too. Otherwise, I've tried to address all feedback
> provided thus far.
>
> [0] https://commitfest.postgresql.org/49/5135/
>
Hi,

I like your idea of parallelizing these checks with async libpq API,
thanks for working on it. The patch doesn't apply cleanly on master
anymore, but I've rebased locally and taken it for a quick spin with a
pg16 instance of 1000 empty databases. Didn't see any regressions with
-j 1, there's some speedup with -j 8 (33 sec vs 8 sec for these checks).

One thing that I noticed that could be improved is we could start a new
connection right away after having run all query callbacks for the
current connection in process_slot, instead of just returning and
establishing the new connection only on the next iteration of the loop
in async_task_run after potentially sleeping on select.

+1 to Jeff's suggestion that perhaps we could reuse connections, but
perhaps that's a separate story.

Regards,

Ilya

In response to

Re: optimizing pg_upgrade's once-in-each-database steps at 2024-07-22 20:07:10 from Nathan Bossart

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Rowley	2024-07-31 21:58:44	Re: Add mention of execution time memory for enable_partitionwise_* GUCs
Previous Message	Sutou Kouhei	2024-07-31 21:49:49	Re: Fixing backslash dot for COPY FROM...CSV