Quick Links

Re: connection establishment versus parallel workers

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: connection establishment versus parallel workers
Date:	2025-01-20 05:33:23
Message-ID:	CA+hUKGKgqZuDBEHeervP_bjXFRfouM1HBy-NUEwJxy_yejN-OQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Jan 14, 2025 at 9:42 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Tue, Jan 14, 2025 at 8:50 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> > I gave these a closer look, and I still feel that they are both
> > straightforward and reasonable. IIUC the main open question is whether
> > this might cause problems for other PM signal kinds. Like you, I don't see
> > anything immediately obvious there, but I'll admit I'm not terribly
> > familiar with the precise characteristics of postmaster signals. In any
> > case, 0001 feels pretty safe to me.
>
> Cool. Thanks. I'll think about what else could be affected by that
> change as you say, and if nothing jumps out I'll go ahead and commit
> them, back to 16.

I pushed 0001, addressing the main problem.

I think 0002 described and addressed a real phenomenon but only when
you have multiple sockets with non-empty listen queues. If we fixed
the real underlying problems it wouldn't be an issue. I decided to
unsee that for now.

> I have done a lot more study of this problem and was about to write in
> with some more patches to propose for master only. Basically that
> "100" is destroying performance in this workload, which at least on my
> machine hardly gets any parallelism at all, and only in sporadic
> bursts. You can argue that we aren't designed for high frequency
> short-lived workers (we'll have to reuse workers in some way to be
> good at that), but I don't think it has to fail as badly as it does
> today. It falls off a cliff instead of plateauing: we are so busy
> forking that we don't get around to reaping children, so all our slots
> are (artificially) used up most of the time, and the queries that do
> manage to nab one then sit on their hands for a long time at query
> end. "1" gets much smoother results, but as prophesied in aa1351f1,
> the complexity is terrible, possibly even O(n^3) in places depending
> on how you count: there are many places that scan the whole worker
> list, and one that even scans it again for each item, and that is for
> each thing that starts. IOW we have to fix the complexity
> fundamentally. I have a WIP patch that adds a couple of work queues,
> so that the postmaster never has to consider anything more than the
> head of a queue in various places. More soon...

Here's the WIP code I have up with for that so far.

Remaining opportunities not attempted:
1. When a child exits, we could use a hash table to find it by pid.
2. When looking for a bgworker slot that is not in use, we could do
something better than linear search.

Attachment	Content-Type	Size
0001-Remove-BackgroundWorkerStateChange-s-outer-loop.patch	text/x-patch	11.4 KB
0002-Remove-BackgroundWorkerStateChange-s-inner-loop.patch	text/x-patch	3.7 KB
0003-Remove-loops-over-BackgroundWorkerList.patch	text/x-patch	21.2 KB

In response to

Re: connection establishment versus parallel workers at 2025-01-13 20:42:00 from Thomas Munro

Responses

Re: connection establishment versus parallel workers at 2025-02-06 22:53:58 from Thomas Munro

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Srinath Reddy	2025-01-20 05:56:21	Re: [PATCH] immediately kill psql process if server is not running.
Previous Message	Peter Smith	2025-01-20 05:32:07	Re: Pgoutput not capturing the generated columns