Re: optimizing pg_upgrade's once-in-each-database steps

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: Ilya Gladyshev <ilya(dot)v(dot)gladyshev(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: optimizing pg_upgrade's once-in-each-database steps
Date: 2024-08-10 15:17:27
Message-ID: ZreEh54Xr6D3Qy8k@nathan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 09, 2024 at 04:06:16PM -0400, Corey Huinker wrote:
>> I'll admit I hadn't really considered pipelining, but I'm tempted to say
>> that it's probably not worth the complexity. Not only do most of the tasks
>> have only one step, but even tasks like the data types check are unlikely
>> to require more than a few queries for upgrades from supported versions.
>
> Can you point me to a complex multi-step task that you think wouldn't work
> for pipelining? My skimming of the other patches all seemed to be one query
> with one result set to be processed by one callback.

I think it would work fine. I'm just not sure it's worth it, especially
for tasks that run one exactly one query in each connection.

>> Furthermore, most of the callbacks should do almost nothing for a given
>> upgrade, and since pg_upgrade runs on the server, client/server round-trip
>> time should be pretty low.
>
> To my mind, that makes pipelining make more sense, you throw out N queries,
> most of which are trivial, and by the time you cycle back around and start
> digesting result sets via callbacks, more of the queries have finished
> because they were waiting on the query ahead of them in the pipeline, not
> waiting on a callback to finish consuming its assigned result set and then
> launching the next task query.

My assumption is that the "waiting for a callback before launching the next
query" time will typically be pretty short in practice. I could try
measuring it...

>> Perhaps pipelining would make more sense if we consolidated the tasks a bit
>> better, but when I last looked into that, I didn't see a ton of great
>> opportunities that would help anything except for upgrades from really old
>> versions. Even then, I'm not sure if pipelining is worth it.
>
> I think you'd want to do the opposite of consolidating the tasks. If
> anything, you'd want to break them down in known single-query operations,
> and if the callback function for one of them happens to queue up a
> subsequent query (with subsequent callback) then so be it.

By "consolidating," I mean combining tasks into fewer tasks with additional
steps. This would allow us to reuse connections instead of creating N
connections for every single query. If we used a task per query, I'd
expect pipelining to provide zero benefit.

--
nathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2024-08-10 15:18:31 Re: SPI_connect, SPI_connect_ext return type
Previous Message Stepan Neretin 2024-08-10 14:45:01 Re: SPI_connect, SPI_connect_ext return type