From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Peter Geoghegan <pg(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Misaligned BufferDescriptors causing major performance problems on AMD |
Date: | 2015-01-01 18:58:02 |
Message-ID: | 20150101185802.GA13930@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jan 1, 2015 at 05:59:25PM +0100, Andres Freund wrote:
> > That seems like a strange approach. I think it's pretty sensible to
> > try to ensure that allocated blocks of shared memory have decent
> > alignment, and we don't have enough of them for aligning on 64-byte
> > boundaries (or even 128-byte boundaries, perhaps) to eat up any
> > meaningful amount of memory. The BUFFERALIGN() stuff, like much else
> > about the way we manage shared memory, has also made its way into the
> > dynamic-shared-memory code. So if we do adjust the alignment that we
> > guarantee for the main shared memory segment, we should perhaps adjust
> > DSM to match. But I guess I don't understand why you'd want to do it
> > that way.
>
> The problem is that just aligning the main allocation to some boundary
> doesn't mean the hot part of the allocation is properly aligned. shmem.c
> in fact can't really do much about that - so fully moving the
> responsibility seems more likely to ensure that future code thinks about
> alignment.
Yes, there is shared memory allocation alignment and object alignment.
Since there are only about 50 cases of these, a worst-case change to
force 64-byte alignment would only cost 3.2k of shared memory.
It might make sense to make them all 64-byte aligned to reduce CPU cache
contention, but we have to have actual performance numbers to prove
that. My two patches allow individual object alignment to be tested. I
have not been able to see any performance difference (<1%) with:
$ pgbench --initialize --scale 100 pgbench
$ pgbench --protocol prepared --client 32 --jobs 16 --time=100 --select-only pgbench
on my dual-socket 16 vcore server.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2015-01-01 19:49:06 | Re: Misaligned BufferDescriptors causing major performance problems on AMD |
Previous Message | Robert Haas | 2015-01-01 17:59:57 | Re: Parallel Seq Scan |