From: | Floris Van Nee <florisvannee(at)Optiver(dot)com> |
---|---|
To: | Floris Van Nee <florisvannee(at)Optiver(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Andres Freund <andres(at)anarazel(dot)de> |
Subject: | RE: Delaying/avoiding BTreeTupleGetNAtts() call within _bt_compare() |
Date: | 2020-01-28 21:34:34 |
Message-ID: | 2ca215d085a74e2eabd31b76e97cc9f3@opammb0561.comp.optiver.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>
> I could do some tests with the patch on some larger machines. What exact
> tests do you propose? Are there some specific postgresql.conf settings and
> pgbench initialization you recommend for this? And was the test above just
> running 'pgbench -S' select-only with specific -T, -j and -c parameters?
>
With Andres' instructions I ran a couple of tests. With your patches I can reproduce a speedup of ~3% on single core tests reliably on a dual-socket 36-core machine for the pgbench select-only test case. When using the full scale test my results are way too noisy even for large runs unfortunately. I also tried some other queries (for example select's that return 10 or 100 rows instead of just 1), but can't see much of a speed-up there either, although it also doesn't hurt.
So I guess the most noticeable one is the select-only benchmark for 1 core:
<Master>
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 1
number of threads: 1
duration: 600 s
number of transactions actually processed: 30255419
latency average = 0.020 ms
latency stddev = 0.001 ms
tps = 50425.693234 (including connections establishing)
tps = 50425.841532 (excluding connections establishing)
<Patched>
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 1
number of threads: 1
duration: 600 s
number of transactions actually processed: 31363398
latency average = 0.019 ms
latency stddev = 0.001 ms
tps = 52272.326597 (including connections establishing)
tps = 52272.476380 (excluding connections establishing)
This is the one with 40 clients, 40 threads. Not really an improvement, and quite still quite noisy.
<Master>
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 40
number of threads: 40
duration: 600 s
number of transactions actually processed: 876846915
latency average = 0.027 ms
latency stddev = 0.015 ms
tps = 1461407.539610 (including connections establishing)
tps = 1461422.084486 (excluding connections establishing)
<Patched>
transaction type: <builtin: select only>
scaling factor: 300
query mode: prepared
number of clients: 40
number of threads: 40
duration: 600 s
number of transactions actually processed: 872633979
latency average = 0.027 ms
latency stddev = 0.038 ms
tps = 1454387.326179 (including connections establishing)
tps = 1454396.879195 (excluding connections establishing)
For tests that don't use the full machine (eg. 10 clients, 10 threads) I see speed-ups as well, but not as high as the single-core run. It seems there are other bottlenecks (on the machine) coming into play.
-Floris
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2020-01-28 21:58:32 | Re: Removing pg_pltemplate and creating "trustable" extensions |
Previous Message | Ranier Vilela | 2020-01-28 21:19:48 | Re: [PATCH] Windows port, fix some resources leaks |