From: | Nathan Bossart <nathandbossart(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Cc: | thomas(dot)munro(at)gmail(dot)com |
Subject: | connection establishment versus parallel workers |
Date: | 2024-12-11 20:42:58 |
Message-ID: | Z1n5UpAiGDmFcMmd@nathan |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
My team recently received a report about connection establishment times
increasing substantially from v16 onwards. Upon further investigation,
this seems to have something to do with commit 7389aad (which moved a lot
of postmaster code out of signal handlers) in conjunction with workloads
that generate many parallel workers. I've attached a set of reproduction
steps. The issue seems to be worst on larger machines (e.g., r8g.48xlarge,
r5.24xlarge) when max_parallel_workers/max_worker_process is set very high
(>= 48).
Our theory is that commit 7389aad (and follow-ups like commit 239b175) made
parallel worker processing much more responsive to the point of contending
with incoming connections, and that before this change, the kernel balanced
the execution of the signal handlers and ServerLoop() to prevent this. I
don't have a concrete proposal yet, but I thought it was still worth
starting a discussion. TBH I'm not sure we really need to do anything
since this arguably comes down to a trade-off between connection and worker
responsiveness.
--
nathan
Attachment | Content-Type | Size |
---|---|---|
repro.txt | text/plain | 504 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Guillaume Lelarge | 2024-12-11 20:46:16 | Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE |
Previous Message | Jelte Fennema-Nio | 2024-12-11 20:41:33 | Re: Proposals for EXPLAIN: rename ANALYZE to EXECUTE and extend VERBOSE |