From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Misaligned BufferDescriptors causing major performance problems on AMD |
Date: | 2014-02-02 15:13:19 |
Message-ID: | 20140202151319.GD32123@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
In the nearby thread at
http://archives.postgresql.org/message-id/20140202140014.GM5930%40awork2.anarazel.de
Peter and I discovered that there is a large performance difference
between different max_connections on a larger machine (4x Opteron 6272,
64 cores together) in a readonly pgbench tests...
Just as reference, we're talking about a performance degradation from
475963.613865 tps to 197744.913556 in a pgbench -S -cj64 just by setting
max_connections to 90, from 91...
On 2014-02-02 15:00:14 +0100, Andres Freund wrote:
> On 2014-02-01 19:47:29 -0800, Peter Geoghegan wrote:
> > Here are the results of a benchmark on Nathan Boley's 64-core, 4
> > socket server: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/amd-4-socket-rwlocks/
>
> That's interesting. The maximum number of what you see here (~293125)
> is markedly lower than what I can get.
>
> ... poke around ...
>
> Hm, that's partially because you're using pgbench without -M prepared if
> I see that correctly. The bottleneck in that case is primarily memory
> allocation. But even after that I am getting higher
> numbers: ~342497.
>
> Trying to nail down the differnce it oddly seems to be your
> max_connections=80 vs my 100. The profile in both cases is markedly
> different, way much more spinlock contention with 80. All in
> Pin/UnpinBuffer().
>
> I think =80 has to lead to some data being badly aligned. I can
> reproduce that =91 has *much* better performance than =90. 170841.844938
> vs 368490.268577 in a 10s test. Reproducable both with an without the test.
> That's certainly worth some investigation.
> This is *not* reproducable on the intel machine, so it might the
> associativity of the L1/L2 cache on the AMD.
So, I looked into this, and I am fairly certain it's because of the
(mis-)alignment of the buffer descriptors. With certain max_connections
settings InitBufferPool() happens to get 64byte aligned addresses, with
others not. I checked the alignment with gdb to confirm that.
A quick hack (attached) making BufferDescriptor 64byte aligned indeed
restored performance across all max_connections settings. It's not
surprising that a misaligned buffer descriptor causes problems -
there'll be plenty of false sharing of the spinlocks otherwise. Curious
that the the intel machine isn't hurt much by this.
Now all this hinges on the fact that by a mere accident
BufferDescriptors are 64byte in size:
struct sbufdesc {
BufferTag tag; /* 0 20 */
BufFlags flags; /* 20 2 */
uint16 usage_count; /* 22 2 */
unsigned int refcount; /* 24 4 */
int wait_backend_pid; /* 28 4 */
slock_t buf_hdr_lock; /* 32 1 */
/* XXX 3 bytes hole, try to pack */
int buf_id; /* 36 4 */
int freeNext; /* 40 4 */
/* XXX 4 bytes hole, try to pack */
LWLock * io_in_progress_lock; /* 48 8 */
LWLock * content_lock; /* 56 8 */
/* --- cacheline 1 boundary (64 bytes) --- */
/* size: 64, cachelines: 1, members: 10 */
/* sum members: 57, holes: 2, sum holes: 7 */
};
We could polish up the attached patch and apply it to all the branches,
the costs of memory are minimal. But I wonder if we shouldn't instead
make ShmemInitStruct() always return cacheline aligned addresses. That
will require some fiddling, but it might be a good idea nonetheless?
I think we should also consider some more reliable measures to have
BufferDescriptors cacheline sized, rather than relying on the happy
accident. Debugging alignment issues isn't fun, too much of a guessing
game...
Thoughts?
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
Attachment | Content-Type | Size |
---|---|---|
keep-BufferDescriptors-cacheline-aligned.patch | text/x-patch | 1.0 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2014-02-02 15:18:07 | Re: [9.3 bug] disk space in pg_xlog increases during archive recovery |
Previous Message | Fujii Masao | 2014-02-02 14:50:40 | Re: [9.3 bug] disk space in pg_xlog increases during archive recovery |