Re: optimizing pg_upgrade's once-in-each-database steps

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: optimizing pg_upgrade's once-in-each-database steps
Date: 2024-07-19 21:21:37
Message-ID: ZprY4e23vR95Nc51@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 18, 2024 at 09:57:23AM +0200, Daniel Gustafsson wrote:
>> On 17 Jul 2024, at 23:32, Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
>> On Wed, Jul 17, 2024 at 11:16:59PM +0200, Daniel Gustafsson wrote:
>
>>> +static void
>>> +dispatch_query(const ClusterInfo *cluster, AsyncSlot *slot,
>>> ....
>>> + pg_free(query);
>>> +}
>>>
>>> A minor point, perhaps fueled by me not having played around much with this
>>> patchset. It seems a bit odd that dispatch_query is responsible for freeing
>>> the query from the get_query callback. I would have expected the output from
>>> AsyncTaskGetQueryCB to be stored in AsyncTask and released by async_task_free.
>>
>> I don't see any problem with doing it the way you suggest.

Actually, I do see a problem. If we do it this way, we'll have to store a
string per database somewhere, which seems unnecessary.

However, while looking into this, I noticed that only one get_query
callback (get_db_subscription_count()) actually customizes the generated
query using information in the provided DbInfo. AFAICT we can do this
particular step without running a query in each database, as I mentioned
elsewhere [0]. That should speed things up a bit and allow us to simplify
the AsyncTask code.

With that, if we are willing to assume that a given get_query callback will
generate the same string for all databases (and I think we should), we can
run the callback once and save the string in the step for dispatch_query()
to use. This would look more like what you suggested in the quoted text.

[0] https://postgr.es/m/ZprQJv_TxccN3tkr%40nathan

--
nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Doug Reynolds 2024-07-19 21:21:50 Re: Enhance pg_dump multi-threaded streaming (WAS: Re: filesystem full during vacuum - space recovery issues)
Previous Message Paul George 2024-07-19 21:21:05 Re: behavior of GROUP BY with VOLATILE expressions