From: | Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(at)vondra(dot)me> |
Subject: | Re: NUMA shared memory interleaving |
Date: | 2025-04-18 17:48:16 |
Message-ID: | aAKQYKIFtCTWkcbJ@ip-10-97-1-34.eu-west-3.compute.internal |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On Thu, Apr 17, 2025 at 01:58:44AM +1200, Thomas Munro wrote:
> On Wed, Apr 16, 2025 at 9:14 PM Jakub Wartak
> <jakub(dot)wartak(at)enterprisedb(dot)com> wrote:
> > 2. Should we also interleave DSA/DSM for Parallel Query? (I'm not an
> > expert on DSA/DSM at all)
>
> I have no answers but I have speculated for years about a very
> specific case (without any idea where to begin due to lack of ... I
> guess all this sort of stuff): in ExecParallelHashJoinNewBatch(),
> workers split up and try to work on different batches on their own to
> minimise contention, and when that's not possible (more workers than
> batches, or finishing their existing work at different times and going
> to help others), they just proceed in round-robin order. A beginner
> thought is: if you're going to help someone working on a hash table,
> it would surely be best to have the CPUs and all the data on the same
> NUMA node. During loading, cache line ping pong would be cheaper, and
> during probing, it *might* be easier to tune explicit memory prefetch
> timing that way as it would look more like a single node system with a
> fixed latency, IDK (I've shared patches for prefetching before that
> showed pretty decent speedups, and the lack of that feature is
> probably a bigger problem than any of this stuff, who knows...).
> Another beginner thought is that the DSA allocator is a source of
> contention during loading: the dumbest problem is that the chunks are
> just too small, but it might also be interesting to look into per-node
> pools. Or something. IDK, just some thoughts...
I'm also thinking that could be beneficial for parallel workers. I think the
ideal scenario would be to have the parallel workers spread across numa nodes and
accessing their "local" memory first (and help with "remote" memory access if
there is still more work to do "remotely").
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Bertrand Drouvot | 2025-04-18 17:50:08 | Re: pending_since assertion failure on skink |
Previous Message | Bertrand Drouvot | 2025-04-18 17:43:14 | Re: NUMA shared memory interleaving |