From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Andres Freund <andres(at)2ndquadrant(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Peter Geoghegan <pg(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Misaligned BufferDescriptors causing major performance problems on AMD |
Date: | 2014-12-24 03:51:22 |
Message-ID: | 20141224035122.GA15375@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Apr 17, 2014 at 11:23:24AM +0200, Andres Freund wrote:
> On 2014-04-16 19:18:02 -0400, Bruce Momjian wrote:
> > On Thu, Feb 6, 2014 at 09:40:32AM +0100, Andres Freund wrote:
> > > On 2014-02-05 12:36:42 -0500, Robert Haas wrote:
> > > > >> It may well be that your proposal is spot on. But I'd like to see some
> > > > >> data-structure-by-data-structure measurements, rather than assuming that
> > > > >> alignment must be a good thing.
> > > > >
> > > > > I am fine with just aligning BufferDescriptors properly. That has
> > > > > clearly shown massive improvements.
> > > >
> > > > I thought your previous idea of increasing BUFFERALIGN to 64 bytes had
> > > > a lot to recommend it.
> > >
> > > Good.
> > >
> > > I wonder if we shouldn't move that bit of logic:
> > > if (size >= BUFSIZ)
> > > newStart = BUFFERALIGN(newStart);
> > > out of ShmemAlloc() and instead have a ShmemAllocAligned() and
> > > ShmemInitStructAligned() that does it. So we can sensibly can control it
> > > per struct.
> > >
> > > > But that doesn't mean it doesn't need testing.
> > >
> > > I feel the need here, to say that I never said it doesn't need testing
> > > and never thought it didn't...
> >
> > Where are we on this?
>
> It needs somebody with time to evaluate possible performance regressions
> - I personally won't have time to look into this in detail before pgcon.
I am doing performance testing to try to complete this item. I used the
first attached patch to report which structures are 64-byte aligned:
64-byte shared memory alignment of Control File: 0
64-byte shared memory alignment of XLOG Ctl: 1
64-byte shared memory alignment of CLOG Ctl: 0
64-byte shared memory alignment of CommitTs Ctl: 0
64-byte shared memory alignment of CommitTs shared: 0
64-byte shared memory alignment of SUBTRANS Ctl: 1
64-byte shared memory alignment of MultiXactOffset Ctl: 1
64-byte shared memory alignment of MultiXactMember Ctl: 1
64-byte shared memory alignment of Shared MultiXact State: 1
64-byte shared memory alignment of Buffer Descriptors: 1
64-byte shared memory alignment of Buffer Blocks: 1
64-byte shared memory alignment of Shared Buffer Lookup Table: 1
64-byte shared memory alignment of Buffer Strategy Status: 1
64-byte shared memory alignment of LOCK hash: 0
64-byte shared memory alignment of PROCLOCK hash: 0
64-byte shared memory alignment of Fast Path Strong Relation Lock Data: 0
64-byte shared memory alignment of PREDICATELOCKTARGET hash: 0
64-byte shared memory alignment of PREDICATELOCK hash: 0
64-byte shared memory alignment of PredXactList: 0
64-byte shared memory alignment of SERIALIZABLEXID hash: 1
64-byte shared memory alignment of RWConflictPool: 1
64-byte shared memory alignment of FinishedSerializableTransactions: 0
64-byte shared memory alignment of OldSerXid SLRU Ctl: 1
64-byte shared memory alignment of OldSerXidControlData: 1
64-byte shared memory alignment of Proc Header: 0
64-byte shared memory alignment of Proc Array: 0
64-byte shared memory alignment of Backend Status Array: 0
64-byte shared memory alignment of Backend Application Name Buffer: 0
64-byte shared memory alignment of Backend Client Host Name Buffer: 0
64-byte shared memory alignment of Backend Activity Buffer: 0
64-byte shared memory alignment of Prepared Transaction Table: 0
64-byte shared memory alignment of Background Worker Data: 0
64-byte shared memory alignment of shmInvalBuffer: 1
64-byte shared memory alignment of PMSignalState: 0
64-byte shared memory alignment of ProcSignalSlots: 0
64-byte shared memory alignment of Checkpointer Data: 0
64-byte shared memory alignment of AutoVacuum Data: 0
64-byte shared memory alignment of Wal Sender Ctl: 0
64-byte shared memory alignment of Wal Receiver Ctl: 0
64-byte shared memory alignment of BTree Vacuum State: 0
64-byte shared memory alignment of Sync Scan Locations List: 0
64-byte shared memory alignment of Async Queue Control: 0
64-byte shared memory alignment of Async Ctl: 0
Many of these are 64-byte aligned, including Buffer Descriptors. I
tested pgbench with these commands:
$ pgbench -i -s 95 pgbench
$ pgbench -S -c 95 -j 95 -t 100000 pgbench
on a 16-core Xeon server and got 84k tps. I then applied another patch,
attached, which causes all the structures to be non-64-byte aligned, but
got the same tps number.
Can someone test these patches on an AMD CPU and see if you see a
difference? Thanks.
--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +
Attachment | Content-Type | Size |
---|---|---|
align.diff | text/x-diff | 531 bytes |
noalign.diff | text/x-diff | 914 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Rahila Syed | 2014-12-24 04:28:42 | Re: [REVIEW] Re: Compression of full-page-writes |
Previous Message | Andreas Karlsson | 2014-12-24 03:04:23 | Re: Using 128-bit integers for sum, avg and statistics aggregates |