From: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Improving connection scalability (src/backend/storage/ipc/procarray.c) |
Date: | 2022-05-29 17:26:44 |
Message-ID: | CAEudQArDKoDAD8i4=5Hq=Y7aD8+xMMxkhe07ocLJHFZGVQKKkw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Em sáb., 28 de mai. de 2022 às 09:35, Tomas Vondra <
tomas(dot)vondra(at)enterprisedb(dot)com> escreveu:
> On 5/28/22 02:36, Ranier Vilela wrote:
> > Em sex., 27 de mai. de 2022 às 18:22, Andres Freund <andres(at)anarazel(dot)de
> > <mailto:andres(at)anarazel(dot)de>> escreveu:
> >
> > Hi,
> >
> > On 2022-05-27 10:35:08 -0300, Ranier Vilela wrote:
> > > Em qui., 26 de mai. de 2022 às 22:30, Tomas Vondra <
> > > tomas(dot)vondra(at)enterprisedb(dot)com
> > <mailto:tomas(dot)vondra(at)enterprisedb(dot)com>> escreveu:
> > >
> > > > On 5/27/22 02:11, Ranier Vilela wrote:
> > > > >
> > > > > ...
> > > > >
> > > > > Here the results with -T 60:
> > > >
> > > > Might be a good idea to share your analysis / interpretation of
> the
> > > > results, not just the raw data. After all, the change is being
> > proposed
> > > > by you, so do you think this shows the change is beneficial?
> > > >
> > > I think so, but the expectation has diminished.
> > > I expected that the more connections, the better the performance.
> > > And for both patch and head, this doesn't happen in tests.
> > > Performance degrades with a greater number of connections.
> >
> > Your system has four CPUs. Once they're all busy, adding more
> > connections
> > won't improve performance. It'll just add more and more context
> > switching,
> > cache misses, and make the OS scheduler do more work.
> >
> > conns tps head
> > 10 82365.634750
> > 50 74593.714180
> > 80 69219.756038
> > 90 67419.574189
> > 100 66613.771701
> > Yes it is quite disappointing that with 100 connections, tps loses to 10
> > connections.
> >
>
> IMO that's entirely expected on a system with only 4 cores. Increasing
> the number of connections inevitably means more overhead (you have to
> track/manage more stuff). And at some point the backends start competing
> for L2/L3 caches, context switches are not free either, etc. So once you
> cross ~2-3x the number of cores, you should expect this.
>
> This behavior is natural/inherent, it's unlikely to go away, and it's
> one of the reasons why we recommend not to use too many connections. If
> you try to maximize throughput, just don't do that. Or just use machine
> with more cores.
>
> >
> > > GetSnapShowData() isn't a bottleneck?
> >
> > I'd be surprised if it showed up in a profile on your machine with
> that
> > workload in any sort of meaningful way. The snapshot reuse logic
> > will always
> > work - because there are no writes - and thus the only work that
> > needs to be
> > done is to acquire the ProcArrayLock briefly. And because there is
> > only a
> > small number of cores, contention on the cacheline for that isn't a
> > problem.
> >
> > Thanks for sharing this.
> >
> >
> >
> >
> > > > These results look much saner, but IMHO it also does not show
> > any clear
> > > > benefit of the patch. Or are you still claiming there is a
> benefit?
> > > >
> > > We agree that they are micro-optimizations. However, I think they
> > should be
> > > considered micro-optimizations in inner loops, because all in
> > procarray.c is
> > > a hotpath.
> >
> > As explained earlier, I don't agree that they optimize anything -
> you're
> > making some of the scalability behaviour *worse*, if it's changed at
> > all.
> >
> >
> > > The first objective, I believe, was achieved, with no performance
> > > regression.
> > > I agree, the gains are small, by the tests done.
> >
> > There are no gains.
> >
> > IMHO, I must disagree.
> >
>
> You don't have to, really. What you should do is showing results
> demonstrating the claimed gains, and so far you have not done that.
>
> I don't want to be rude, but so far you've shown results from a
> benchmark testing fork(), due to only running 10 transactions per
> client, and then results from a single run for each client count (which
> doesn't really show any gains either, and is so noisy).
>
> As mentioned GetSnapshotData() is not even in perf profile, so why would
> the patch even make a difference?
>
> You've also claimed it helps generating better code on older compilers,
> but you've never supported that with any evidence.
>
>
> Maybe there is an improvement - show us. Do a benchmark with more runs,
> to average-out the noise. Calculate VAR/STDEV to show how variable the
> results are. Use that to compare results and decide if there is an
> improvement. Also, keep in mind binary layout matters [1].
>
I redid the benchmark with a better machine:
Intel i7-10510U
RAM 8GB
SSD 512GB
Linux Ubuntu 64 bits
All files are attached, including the raw data of the results.
I did the calculations as requested.
But a quick average of the 10 benchmarks, done resulted in 10,000 tps more.
Not bad, for a simple patch, made entirely of micro-optimizations.
Results attached.
regards,
Ranier Vilela
Attachment | Content-Type | Size |
---|---|---|
procarray_bench.tar.xz | application/x-xz | 49.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-05-29 18:21:44 | Re: Improving connection scalability (src/backend/storage/ipc/procarray.c) |
Previous Message | Tomas Vondra | 2022-05-29 16:00:14 | Re: Improving connection scalability (src/backend/storage/ipc/procarray.c) |