From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: connection establishment versus parallel workers |
Date: | 2025-01-20 05:33:23 |
Message-ID: | CA+hUKGKgqZuDBEHeervP_bjXFRfouM1HBy-NUEwJxy_yejN-OQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 14, 2025 at 9:42 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Tue, Jan 14, 2025 at 8:50 AM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> > I gave these a closer look, and I still feel that they are both
> > straightforward and reasonable. IIUC the main open question is whether
> > this might cause problems for other PM signal kinds. Like you, I don't see
> > anything immediately obvious there, but I'll admit I'm not terribly
> > familiar with the precise characteristics of postmaster signals. In any
> > case, 0001 feels pretty safe to me.
>
> Cool. Thanks. I'll think about what else could be affected by that
> change as you say, and if nothing jumps out I'll go ahead and commit
> them, back to 16.
I pushed 0001, addressing the main problem.
I think 0002 described and addressed a real phenomenon but only when
you have multiple sockets with non-empty listen queues. If we fixed
the real underlying problems it wouldn't be an issue. I decided to
unsee that for now.
> I have done a lot more study of this problem and was about to write in
> with some more patches to propose for master only. Basically that
> "100" is destroying performance in this workload, which at least on my
> machine hardly gets any parallelism at all, and only in sporadic
> bursts. You can argue that we aren't designed for high frequency
> short-lived workers (we'll have to reuse workers in some way to be
> good at that), but I don't think it has to fail as badly as it does
> today. It falls off a cliff instead of plateauing: we are so busy
> forking that we don't get around to reaping children, so all our slots
> are (artificially) used up most of the time, and the queries that do
> manage to nab one then sit on their hands for a long time at query
> end. "1" gets much smoother results, but as prophesied in aa1351f1,
> the complexity is terrible, possibly even O(n^3) in places depending
> on how you count: there are many places that scan the whole worker
> list, and one that even scans it again for each item, and that is for
> each thing that starts. IOW we have to fix the complexity
> fundamentally. I have a WIP patch that adds a couple of work queues,
> so that the postmaster never has to consider anything more than the
> head of a queue in various places. More soon...
Here's the WIP code I have up with for that so far.
Remaining opportunities not attempted:
1. When a child exits, we could use a hash table to find it by pid.
2. When looking for a bgworker slot that is not in use, we could do
something better than linear search.
Attachment | Content-Type | Size |
---|---|---|
0001-Remove-BackgroundWorkerStateChange-s-outer-loop.patch | text/x-patch | 11.4 KB |
0002-Remove-BackgroundWorkerStateChange-s-inner-loop.patch | text/x-patch | 3.7 KB |
0003-Remove-loops-over-BackgroundWorkerList.patch | text/x-patch | 21.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Srinath Reddy | 2025-01-20 05:56:21 | Re: [PATCH] immediately kill psql process if server is not running. |
Previous Message | Peter Smith | 2025-01-20 05:32:07 | Re: Pgoutput not capturing the generated columns |