Re: Wait free LW_SHARED acquisition - v0.9

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Wait free LW_SHARED acquisition - v0.9
Date: 2014-10-11 00:59:01
Message-ID: 20141011005901.GF6724@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-10-11 06:18:11 +0530, Amit Kapila wrote:
> On Fri, Oct 10, 2014 at 8:11 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
> > On 2014-10-10 17:18:46 +0530, Amit Kapila wrote:
> > > On Fri, Oct 10, 2014 at 1:27 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
> > > wrote:
> > > > > Observations
> > > > > ----------------------
> > > > > a. The patch performs really well (increase upto ~40%) incase all
> the
> > > > > data fits in shared buffers (scale factor -100).
> > > > > b. Incase data doesn't fit in shared buffers, but fits in RAM
> > > > > (scale factor -3000), there is performance increase upto 16 client
> > > count,
> > > > > however after that it starts dipping (in above config unto ~4.4%).
> > > >
> > > > Hm. Interesting. I don't see that dip on x86.
> > >
> > > Is it possible that implementation of some atomic operation is costlier
> > > for particular architecture?
> >
> > Yes, sure. And IIRC POWER improved atomics performance considerably for
> > POWER8...
> >
> > > I have tried again for scale factor 3000 and could see the dip and this
> > > time I have even tried with 175 client count and the dip is
> approximately
> > > 5% which is slightly more than 160 client count.

I've run some short tests on hydra:

scale 1000:

base:
4GB:
tps = 296273.004800 (including connections establishing)
tps = 296373.978100 (excluding connections establishing)

8GB:
tps = 338001.455970 (including connections establishing)
tps = 338177.439106 (excluding connections establishing)

base + freelist:
4GB:
tps = 297057.523528 (including connections establishing)
tps = 297156.987418 (excluding connections establishing)

8GB:
tps = 335123.867097 (including connections establishing)
tps = 335239.122472 (excluding connections establishing)

base + LW_SHARED:
4GB:
tps = 296262.164455 (including connections establishing)
tps = 296357.524819 (excluding connections establishing)
8GB:
tps = 336988.744742 (including connections establishing)
tps = 337097.836395 (excluding connections establishing)

base + LW_SHARED + freelist:
4GB:
tps = 296887.981743 (including connections establishing)
tps = 296980.231853 (excluding connections establishing)

8GB:
tps = 345049.062898 (including connections establishing)
tps = 345161.947055 (excluding connections establishing)

I've also run some preliminary tests using scale=3000 - and I couldn't
see a performance difference either.

Note that all these are noticeably faster than your results.

> > >
> > > Lwlock_contention patches - client_count=128
> > > ----------------------------------------------------------------------
> > >
> > > + 7.95% postgres postgres [.] GetSnapshotData
> > > + 3.58% postgres postgres [.] AllocSetAlloc
> > > + 2.51% postgres postgres [.] _bt_compare
> > > + 2.44% postgres postgres [.]
> > > hash_search_with_hash_value
> > > + 2.33% postgres [kernel.kallsyms] [k] .__copy_tofrom_user
> > > + 2.24% postgres postgres [.] AllocSetFreeIndex
> > > + 1.75% postgres postgres [.]
> > > pg_atomic_fetch_add_u32_impl
> >
> > Uh. Huh? Normally that'll be inline. That's compiled with gcc? What were
> > the compiler settings you used?
>
> Nothing specific, for performance tests where I have to take profiles
> I use below:
> ./configure --prefix=<installation_path> CFLAGS="-fno-omit-frame-pointer"
> make

Hah. Doing so overwrites the CFLAGS configure normally sets. Check
# CFLAGS are selected so:
# If the user specifies something in the environment, that is used.
# else: If the template file set something, that is used.
# else: If coverage was enabled, don't set anything.
# else: If the compiler is GCC, then we use -O2.
# else: If the compiler is something else, then we use -O, unless debugging.

so, if you do like above, you're compiling without optimizations... So,
include at least -O2 as well.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2014-10-11 01:14:38 Re: orangutan seizes up during isolation-check
Previous Message Amit Kapila 2014-10-11 00:48:11 Re: Wait free LW_SHARED acquisition - v0.9