From: | Andres Freund <andres(at)2ndquadrant(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: Wait free LW_SHARED acquisition - v0.9 |
Date: | 2014-10-11 00:59:01 |
Message-ID: | 20141011005901.GF6724@awork2.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
> On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
> > On 2014-10-10 17:18:46 +0530, Amit Kapila wrote:
> > > On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
> > > wrote:
> > > > > Observations
> > > > > ----------------------
> > > > > a. The patch performs really well (increase upto ~40%) incase all
> the
> > > > > data fits in shared buffers (scale factor -100).
> > > > > b. Incase data doesn't fit in shared buffers, but fits in RAM
> > > > > (scale factor -3000), there is performance increase upto 16 client
> > > count,
> > > > > however after that it starts dipping (in above config unto ~4.4%).
> > > >
> > > > Hm. Interesting. I don't see that dip on x86.
> > >
> > > Is it possible that implementation of some atomic operation is costlier
> > > for particular architecture?
> >
> > Yes, sure. And IIRC POWER improved atomics performance considerably for
> > POWER8...
> >
> > > I have tried again for scale factor 3000 and could see the dip and this
> > > time I have even tried with 175 client count and the dip is
> approximately
> > > 5% which is slightly more than 160 client count.
I've run some short tests on hydra:
scale 1000:
base:
4GB:
tps = 296273.004800 (including connections establishing)
tps = 296373.978100 (excluding connections establishing)
8GB:
tps = 338001.455970 (including connections establishing)
tps = 338177.439106 (excluding connections establishing)
base + freelist:
4GB:
tps = 297057.523528 (including connections establishing)
tps = 297156.987418 (excluding connections establishing)
8GB:
tps = 335123.867097 (including connections establishing)
tps = 335239.122472 (excluding connections establishing)
base + LW_SHARED:
4GB:
tps = 296262.164455 (including connections establishing)
tps = 296357.524819 (excluding connections establishing)
8GB:
tps = 336988.744742 (including connections establishing)
tps = 337097.836395 (excluding connections establishing)
base + LW_SHARED + freelist:
4GB:
tps = 296887.981743 (including connections establishing)
tps = 296980.231853 (excluding connections establishing)
8GB:
tps = 345049.062898 (including connections establishing)
tps = 345161.947055 (excluding connections establishing)
I've also run some preliminary tests using scale=3000 - and I couldn't
see a performance difference either.
Note that all these are noticeably faster than your results.
> > >
> > > Lwlock_contention patches - client_count=128
> > > ----------------------------------------------------------------------
> > >
> > > + 7.95% postgres postgres [.] GetSnapshotData
> > > + 3.58% postgres postgres [.] AllocSetAlloc
> > > + 2.51% postgres postgres [.] _bt_compare
> > > + 2.44% postgres postgres [.]
> > > hash_search_with_hash_value
> > > + 2.33% postgres [kernel.kallsyms] [k] .__copy_tofrom_user
> > > + 2.24% postgres postgres [.] AllocSetFreeIndex
> > > + 1.75% postgres postgres [.]
> > > pg_atomic_fetch_add_u32_impl
> >
> > Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were
> > the compiler settings you used?
>
> Nothing specific, for performance tests where I have to take profiles
> I use below:
> ./configure --prefix=<installation_path> CFLAGS="-fno-omit-frame-pointer"
> make
Hah. Doing so overwrites the CFLAGS configure normally sets. Check
# CFLAGS are selected so:
# If the user specifies something in the environment, that is used.
# else: If the template file set something, that is used.
# else: If coverage was enabled, don't set anything.
# else: If the compiler is GCC, then we use -O2.
# else: If the compiler is something else, then we use -O, unless debugging.
so, if you do like above, you're compiling without optimizations... So,
include at least -O2 as well.
Greetings,
Andres Freund
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2014-10-11 01:14:38 | Re: orangutan seizes up during isolation-check |
Previous Message | Amit Kapila | 2014-10-11 00:48:11 | Re: Wait free LW_SHARED acquisition - v0.9 |