From: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: spin_delay() for ARM |
Date: | 2020-04-16 07:32:34 |
Message-ID: | CAFj8pRDYc+t4oDBa01ErU3oSa2sMUSki3A2sqALz9rjo50034w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
čt 16. 4. 2020 v 9:18 odesílatel Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
napsal:
> On Mon, 13 Apr 2020 at 20:16, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
> wrote:
> > On Sat, 11 Apr 2020 at 04:18, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > >
> > > I wrote:
> > > > A more useful test would be to directly experiment with contended
> > > > spinlocks. As I recall, we had some test cases laying about when
> > > > we were fooling with the spin delay stuff on Intel --- maybe
> > > > resurrecting one of those would be useful?
> > >
> > > The last really significant performance testing we did in this area
> > > seems to have been in this thread:
> > >
> > >
> https://www.postgresql.org/message-id/flat/CA%2BTgmoZvATZV%2BeLh3U35jaNnwwzLL5ewUU_-t0X%3DT0Qwas%2BZdA%40mail.gmail.com
> > >
> > > A relevant point from that is Haas' comment
> > >
> > > I think optimizing spinlocks for machines with only a few CPUs is
> > > probably pointless. Based on what I've seen so far, spinlock
> > > contention even at 16 CPUs is negligible pretty much no matter what
> > > you do. Whether your implementation is fast or slow isn't going to
> > > matter, because even an inefficient implementation will account for
> > > only a negligible percentage of the total CPU time - much less
> than 1%
> > > - as opposed to a 64-core machine, where it's not that hard to find
> > > cases where spin-waits consume the *majority* of available CPU time
> > > (recall previous discussion of lseek).
> >
> > Yeah, will check if I find some machines with large cores.
>
> I got hold of a 32 CPUs VM (actually it was a 16-core, but being
> hyperthreaded, CPUs were 32).
> It was an Intel Xeon , 3Gz CPU. 15G available memory. Hypervisor :
> KVM. Single NUMA node.
> PG parameters changed : shared_buffer: 8G ; max_connections : 1000
>
> I compared pgbench results with HEAD versus PAUSE removed like this :
> perform_spin_delay(SpinDelayStatus *status)
> {
> - /* CPU-specific delay each time through the loop */
> - SPIN_DELAY();
>
> Ran with increasing number of parallel clients :
> pgbench -S -c $num -j $num -T 60 -M prepared
> But couldn't find any significant change in the TPS numbers with or
> without PAUSE:
>
> Clients HEAD Without_PAUSE
> 8 244446 247264
> 16 399939 399549
> 24 454189 453244
> 32 1097592 1098844
> 40 1090424 1087984
> 48 1068645 1075173
> 64 1035035 1039973
> 96 976578 970699
>
> May be it will indeed show some difference only with around 64 cores,
> or perhaps a bare metal machine will help; but as of now I didn't get
> such a machine. Anyways, I thought why not archive the results with
> whatever I have.
>
> Not relevant to the PAUSE stuff .... Note that when the parallel
> clients reach from 24 to 32 (which equals the machine CPUs), the TPS
> shoots from 454189 to 1097592 which is more than double speed gain
> with just a 30% increase in parallel sessions. I was not expecting
> this much speed gain, because, with contended scenario already pgbench
> processes are already taking around 20% of the total CPU time of
> pgbench run. May be later on, I will get a chance to run with some
> customized pgbench script that runs a server function which keeps on
> running an index scan on pgbench_accounts, so as to make pgbench
> clients almost idle.
>
what I know, pgbench cannot be used for testing spinlocks problems.
Maybe you can see this issue when a) use higher number clients - hundreds,
thousands. Decrease share memory, so there will be press on related spin
lock.
Regards
Pavel
> Thanks
> -Amit Khandekar
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2020-04-16 07:39:00 | Re: Vacuum o/p with (full 1, parallel 0) option throwing an error |
Previous Message | Masahiko Sawada | 2020-04-16 07:30:02 | Re: xid wraparound danger due to INDEX_CLEANUP false |