From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Ranier Vilela <ranier(dot)vf(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Improving connection scalability (src/backend/storage/ipc/procarray.c) |
Date: | 2022-05-29 16:00:14 |
Message-ID: | 6ec612ba-1c97-6845-5ee2-964b80996898@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 5/28/22 16:12, Ranier Vilela wrote:
> Em sáb., 28 de mai. de 2022 às 09:00, Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com <mailto:tomas(dot)vondra(at)enterprisedb(dot)com>>
> escreveu:
>
> On 5/28/22 02:15, Ranier Vilela wrote:
> >
> >
> > Em sex., 27 de mai. de 2022 às 18:08, Andres Freund
> <andres(at)anarazel(dot)de <mailto:andres(at)anarazel(dot)de>
> > <mailto:andres(at)anarazel(dot)de <mailto:andres(at)anarazel(dot)de>>> escreveu:
> >
> > Hi,
> >
> > On 2022-05-27 03:30:46 +0200, Tomas Vondra wrote:
> > > On 5/27/22 02:11, Ranier Vilela wrote:
> > > > ./pgbench -M prepared -c $conns -j $conns -T 60 -S -n -U
> postgres
> > > >
> > > > pgbench (15beta1)
> > > > transaction type: <builtin: select only>
> > > > scaling factor: 1
> > > > query mode: prepared
> > > > number of clients: 100
> > > > number of threads: 100
> > > > maximum number of tries: 1
> > > > duration: 60 s
> > > >
> > > > conns tps head tps patched
> > > >
> > > > 1 17126.326108 17792.414234
> > > > 10 82068.123383 82468.334836
> > > > 50 73808.731404 74678.839428
> > > > 80 73290.191713 73116.553986
> > > > 90 67558.483043 68384.906949
> > > > 100 65960.982801 66997.793777
> > > > 200 62216.011998 62870.243385
> > > > 300 62924.225658 62796.157548
> > > > 400 62278.099704 63129.555135
> > > > 500 63257.930870 62188.825044
> > > > 600 61479.890611 61517.913967
> > > > 700 61139.354053 61327.898847
> > > > 800 60833.663791 61517.913967
> > > > 900 61305.129642 61248.336593
> > > > 1000 60990.918719 61041.670996
> > > >
> > >
> > > These results look much saner, but IMHO it also does not
> show any
> > clear
> > > benefit of the patch. Or are you still claiming there is a
> benefit?
> >
> > They don't look all that sane to me - isn't that way lower
> than one
> > would
> > expect?
> >
> > Yes, quite disappointing.
> >
> > Restricting both client and server to the same four cores, a
> > thermically challenged older laptop I have around I get 150k
> tps at
> > both 10
> > and 100 clients.
> >
> > And you can share the benchmark details? Hardware, postgres and
> pgbench,
> > please?
> >
> >
> > Either way, I'd not expect to see any GetSnapshotData()
> scalability
> > effects to
> > show up on an "Intel® Core™ i5-8250U CPU Quad Core" - there's just
> > not enough
> > concurrency.
> >
> > This means that our customers will not see any connections scalability
> > with PG15, using the simplest hardware?
> >
>
> No. It means that on 4-core machine GetSnapshotData() is unlikely to be
> a bottleneck, because you'll hit various other bottlenecks way earlier.
>
> I personally doubt it even makes sense to worry about scaling to this
> many connections on such tiny system too much.
>
>
> >
> > The correct pieces of these changes seem very unlikely to affect
> > GetSnapshotData() performance meaningfully.
> >
> > To improve something like GetSnapshotData() you first have to come
> > up with a
> > workload that shows it being a meaningful part of a profile.
> Unless
> > it is,
> > performance differences are going to just be due to various
> forms of
> > noise.
> >
> > Actually in the profiles I got with perf, GetSnapShotData() didn't
> show up.
> >
>
> But that's exactly the point Andres is trying to make - if you don't see
> GetSnapshotData() in the perf profile, why do you think optimizing it
> will have any meaningful impact on throughput?
>
> You see, I've seen in several places that GetSnapShotData() is the
> bottleneck in scaling connections.
> One of them, if I remember correctly, was at an IBM in Russia.
> Another statement occurs in [1][2][3]
No one is claiming GetSnapshotData() can't be a bottleneck on systems
with many cores. That's certainly possible, which is why e.g. Andres
spent a lot of time optimizing for that case.
But that's what we're arguing about. You're trying to convince us that
your patch will improve things, and you're supporting that by numbers
from a machine that is unlikely to be hitting this bottleneck.
> Just because I don't have enough hardware to force GetSnapShotData()
> doesn't mean optimizing it won't make a difference.
Well, the question is if it actually optimizes things. Maybe it does,
which would be great, but people in this thread (including me) seem to
be fairly skeptical about that claim, because the results are frankly
entirely unconvincing.
I doubt we'll just accept changes in such sensitive places without
results from a relevant machine. Maybe if there was a clear agreement
it's a win, but that's not the case here.
> And even on my modest hardware, we've seen gains, small but consistent.
> So IMHO everyone will benefit, including the small servers.
>
No, we haven't seen any convincing gains. I've tried to explain multiple
times that the results you've shared are not showing any clear
improvement, due to only having one run for each client count (which
means there's a lot of noise), impact of binary layout in different
builds, etc. You've ignored all of that, so instead of repeating myself,
I did a simple benchmark on my two machines:
1) i5-2500k / 4 cores and 8GB RAM (so similar to what you have)
2) 2x e5-2620v3 / 16/32 cores, 64GB RAM (so somewhat bigger)
and I tested 1, 2, 5, 10, 50, 100, ...., 1000 clients using the same
benchmark as you (pgbench -S -M prepared ... ). I did 10 client counts
for each client count, to calculate median which evens out the noise.
And for fun I tried this with gcc 9.3, 10.3 and 11.2. The script and
results from both machines are attached.
The results from xeon and gcc 11.2 look like this:
clients master patched diff
---------------------------------------
1 46460 44936 97%
2 87486 84746 97%
5 199102 192169 97%
10 344458 339403 99%
20 515257 512513 99%
30 528675 525467 99%
40 592761 594384 100%
50 694635 706193 102%
100 643950 655238 102%
200 690133 696815 101%
300 670403 677818 101%
400 678573 681387 100%
500 665349 678722 102%
600 666028 670915 101%
700 662316 662511 100%
800 647922 654745 101%
900 650274 654698 101%
1000 644482 649332 101%
Please, explain to me how this shows consistent measurable improvement?
The standard deviation is roughly 1.5% on average, and the difference is
well within that range. Even if there was a tiny improvement for the
high client counts, no one sane will run with that many clients, because
the throughput peaks at ~50 clients. So even if you gain 1% with 500
clients, it's still less than with 50 clients. If anything, this shows
regression for lower client counts.
FWIW this entirely ignores the question is this benchmark even hits the
bottleneck this patch aims to improve. Also, there's the question of
correctness, and I'd bet Andres is right getting snapshot without
holding ProcArrayLock is busted.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment | Content-Type | Size |
---|---|---|
xeon-chart.pdf | application/pdf | 26.0 KB |
i5.csv | text/csv | 38.3 KB |
run.sh | application/x-shellscript | 318 bytes |
xeon.csv | text/csv | 38.4 KB |
i5.pdf | application/pdf | 25.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ranier Vilela | 2022-05-29 17:26:44 | Re: Improving connection scalability (src/backend/storage/ipc/procarray.c) |
Previous Message | Justin Pryzby | 2022-05-29 15:18:50 | Re: convert libpq uri-regress tests to tap test |