From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | robertmhaas(at)gmail(dot)com |
Cc: | ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, andres(at)anarazel(dot)de, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, alvherre(at)2ndquadrant(dot)com, bruce(at)momjian(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, craig(at)2ndquadrant(dot)com |
Subject: | Re: Protect syscache from bloating with negative cache entries |
Date: | 2019-11-19 10:48:10 |
Message-ID: | 20191119.194810.255216975235933051.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I'd like to throw in food for discussion on how much SearchSysCacheN
suffers degradation from some choices on how we can insert a code into
the SearchSysCacheN code path.
I ran the run2.sh script attached, which runs catcachebench2(), which
asks SearchSysCache3() for cached entries (almost) 240000 times per
run. The number of each output line is the mean of 3 times runs, and
stddev. Lines are in "time" order and edited to fit here. "gen_tbl.pl
| psql" creates a database for the benchmark. catcachebench2() runs
the shortest path in the three in the attached benchmark program.
(pg_ctl start)
$ perl gen_tbl.pl | psql ...
(pg_ctl stop)
0. Baseline (0001-benchmark.patch, 0002-Base-change.patch)
At first, I made two binaries from the literally same source. For the
benchmark's sake the source is already modified a bit. Specifically it
has SetCatCacheClock needed by the benchmark, but actually not called
in this benchmark.
time(ms)|stddev(ms)
not patched | 7750.42 | 23.83 # 0.6% faster than 7775.23
not patched | 7864.73 | 43.21
not patched | 7866.80 | 106.47
not patched | 7952.06 | 63.14
master | 7775.23 | 35.76
master | 7870.42 | 120.31
master | 7876.76 | 109.04
master | 7963.04 | 9.49
So, it seems to me that we cannot tell something about differences
below about 80ms (about 1%) now.
1. Inserting a branch in SearchCatCacheInternal. (CatCache_Pattern_1.patch)
This is the most straightforward way to add an alternative feature.
pattern 1 | 8459.73 | 28.15 # 9% (>> 1%) slower than 7757.58
pattern 1 | 8504.83 | 55.61
pattern 1 | 8541.81 | 41.56
pattern 1 | 8552.20 | 27.99
master | 7757.58 | 22.65
master | 7801.32 | 20.64
master | 7839.57 | 25.28
master | 7925.30 | 38.84
It's so slow that it cannot be used.
2. Making SearchCatCacheInternal be an indirect function.
(CatCache_Pattern_2.patch)
Next, I made the work horse routine be called indirectly. The "inline"
for the function acutally let compiler optimize SearchCatCacheN
routines as described in comment but the effect doesn't seem so large
at least for this case.
pattern 2 | 7976.22 | 46.12 (2.6% slower > 1%)
pattern 2 | 8103.03 | 51.57
pattern 2 | 8144.97 | 68.46
pattern 2 | 8353.10 | 34.89
master | 7768.40 | 56.00
master | 7772.02 | 29.05
master | 7775.05 | 27.69
master | 7830.82 | 13.78
3. Making SearchCatCacheN be indirect functions. (CatCache_Pattern_3.patch)
As far as gcc/linux/x86 goes, SearchSysCacheN is comiled into the
following instructions:
0x0000000000866c20 <+0>: movslq %edi,%rdi
0x0000000000866c23 <+3>: mov 0xd3da40(,%rdi,8),%rdi
0x0000000000866c2b <+11>: jmpq 0x856ee0 <SearchCatCache3>
If we made SearchCatCacheN be indirect functions as the patch, it
changes just one instruction as:
0x0000000000866c50 <+0>: movslq %edi,%rdi
0x0000000000866c53 <+3>: mov 0xd3da60(,%rdi,8),%rdi
0x0000000000866c5b <+11>: jmpq *0x4c0caf(%rip) # 0xd27910 <SearchCatCache3>
pattern 3 | 7836.26 | 48.66 (2% slower > 1%)
pattern 3 | 7963.74 | 67.88
pattern 3 | 7966.65 | 101.07
pattern 3 | 8214.57 | 71.93
master | 7679.74 | 62.20
master | 7756.14 | 77.19
master | 7867.14 | 73.33
master | 7893.97 | 47.67
I expected this runs in almost the same time. I'm not sure if it is
the result of spectre_v2 mitigation, but I show status of my
environment as follows.
# uname -r
4.18.0-80.11.2.el8_0.x86_64
# cat /proc/cpuinfo
...
model name : Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
stepping : 12
microcode : 0xae
bugs : spectre_v1 spectre_v2 spec_store_bypass mds
# cat /sys/devices/system/cpu/vulnerabilities/spectre_v2
Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: disabled, RSB filling
I am using CentOS8 and I don't find a handy (or on-the-fly) way to
disable them..
Attached are:
0001-benchmark.patch : catcache benchmark extension (and core side fix)
0002-Base-change.patch : baseline change in this series of benchmark
CatCache_Pattern_1.patch: naive branching
CatCache_Pattern_2.patch: indirect SearchCatCacheInternal
CatCache_Pattern_1.patch: indirect SearchCatCacheN
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
0001-benchmark.patch | text/x-patch | 11.8 KB |
0002-Base-change.patch | text/x-patch | 3.9 KB |
CatCache_Pattern_1.patch | text/x-patch | 557 bytes |
CatCache_Pattern_2.patch | text/x-patch | 1.1 KB |
CatCache_Pattern_3.patch | text/x-patch | 3.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Khandekar | 2019-11-19 11:28:18 | Re: logical decoding : exceeded maxAllocatedDescs for .spill files |
Previous Message | Benjie Gillam | 2019-11-19 10:27:55 | Re: fix for BUG #3720: wrong results at using ltree |