Re: connection establishment versus parallel workers

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: connection establishment versus parallel workers
Date: 2025-01-20 05:33:23
Message-ID: CA+hUKGKgqZuDBEHeervP_bjXFRfouM1HBy-NUEwJxy_yejN-OQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 14, 2025 at 9:42 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Tue, Jan 14, 2025 at 8:50 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> > I gave these a closer look, and I still feel that they are both
> > straightforward and reasonable. IIUC the main open question is whether
> > this might cause problems for other PM signal kinds. Like you, I don't see
> > anything immediately obvious there, but I'll admit I'm not terribly
> > familiar with the precise characteristics of postmaster signals. In any
> > case, 0001 feels pretty safe to me.
>
> Cool. Thanks. I'll think about what else could be affected by that
> change as you say, and if nothing jumps out I'll go ahead and commit
> them, back to 16.

I pushed 0001, addressing the main problem.

I think 0002 described and addressed a real phenomenon but only when
you have multiple sockets with non-empty listen queues. If we fixed
the real underlying problems it wouldn't be an issue. I decided to
unsee that for now.

> I have done a lot more study of this problem and was about to write in
> with some more patches to propose for master only. Basically that
> "100" is destroying performance in this workload, which at least on my
> machine hardly gets any parallelism at all, and only in sporadic
> bursts. You can argue that we aren't designed for high frequency
> short-lived workers (we'll have to reuse workers in some way to be
> good at that), but I don't think it has to fail as badly as it does
> today. It falls off a cliff instead of plateauing: we are so busy
> forking that we don't get around to reaping children, so all our slots
> are (artificially) used up most of the time, and the queries that do
> manage to nab one then sit on their hands for a long time at query
> end. "1" gets much smoother results, but as prophesied in aa1351f1,
> the complexity is terrible, possibly even O(n^3) in places depending
> on how you count: there are many places that scan the whole worker
> list, and one that even scans it again for each item, and that is for
> each thing that starts. IOW we have to fix the complexity
> fundamentally. I have a WIP patch that adds a couple of work queues,
> so that the postmaster never has to consider anything more than the
> head of a queue in various places. More soon...

Here's the WIP code I have up with for that so far.

Remaining opportunities not attempted:
1. When a child exits, we could use a hash table to find it by pid.
2. When looking for a bgworker slot that is not in use, we could do
something better than linear search.

Attachment Content-Type Size
0001-Remove-BackgroundWorkerStateChange-s-outer-loop.patch text/x-patch 11.4 KB
0002-Remove-BackgroundWorkerStateChange-s-inner-loop.patch text/x-patch 3.7 KB
0003-Remove-loops-over-BackgroundWorkerList.patch text/x-patch 21.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Srinath Reddy 2025-01-20 05:56:21 Re: [PATCH] immediately kill psql process if server is not running.
Previous Message Peter Smith 2025-01-20 05:32:07 Re: Pgoutput not capturing the generated columns