From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | hlinnaka(at)iki(dot)fi |
Cc: | robertmhaas(at)gmail(dot)com, ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com, tomas(dot)vondra(at)2ndquadrant(dot)com, tgl(at)sss(dot)pgh(dot)pa(dot)us, andres(at)anarazel(dot)de, tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com, alvherre(at)2ndquadrant(dot)com, bruce(at)momjian(dot)us, pgsql-hackers(at)lists(dot)postgresql(dot)org, michael(dot)paquier(at)gmail(dot)com, david(at)pgmasters(dot)net, craig(at)2ndquadrant(dot)com |
Subject: | Re: Protect syscache from bloating with negative cache entries |
Date: | 2020-11-06 08:24:35 |
Message-ID: | 20201106.172435.2247007593661769380.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thank you for the comment!
First off, I thought that I managed to eliminate the degradation
observed on the previous versions, but significant degradation (1.1%
slower) is still seen in on case.
Anyway, before sending the new patch, let met just answer for the
comments.
At Thu, 5 Nov 2020 11:09:09 +0200, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote in
> On 19/11/2019 12:48, Kyotaro Horiguchi wrote:
> > 1. Inserting a branch in
> > SearchCatCacheInternal. (CatCache_Pattern_1.patch)
> > This is the most straightforward way to add an alternative feature.
> > pattern 1 | 8459.73 | 28.15 # 9% (>> 1%) slower than 7757.58
> > pattern 1 | 8504.83 | 55.61
> > pattern 1 | 8541.81 | 41.56
> > pattern 1 | 8552.20 | 27.99
> > master | 7757.58 | 22.65
> > master | 7801.32 | 20.64
> > master | 7839.57 | 25.28
> > master | 7925.30 | 38.84
> > It's so slow that it cannot be used.
>
> This is very surprising. A branch that's never taken ought to be
> predicted by the CPU's branch-predictor, and be very cheap.
(A) original test patch
I naively thought that the code path is too short to bury the
degradation of additional a few instructions. Actually I measured
performance again with the same patch set on the current master and
had the more or less the same result.
master 8195.58ms, patched 8817.40 ms: +10.75%
However, I noticed that the additional call was a recursive call and a
jmp inserted for the recursive call seems taking significant
time. After avoiding the recursive call, the difference reduced to
+0.96% (master 8268.71ms : patched 8348.30ms)
Just two instructions below are inserted in this case, which looks
reasonable.
8720ff <+31>: cmpl $0xffffffff,0x4ba942(%rip) # 0xd2ca48 <catalog_cache_prune_min_age>
872106 <+38>: jl 0x872240 <SearchCatCache1+352> (call to a function)
(C) inserting bare counter-update code without a branch
> Do we actually need a branch there? If I understand correctly, the
> point is to bump up a usage counter on the catcache entry. You could
> increment the counter unconditionally, even if the feature is not
> used, and avoid the branch that way.
That change causes 4.9% degradation, which is worse than having a
branch.
master 8364.54ms, patched 8666.86ms (+4.9%)
The additional instructions follow.
+ 8721ab <+203>: mov 0x30(%rbx),%eax # %eax = ct->naccess
+ 8721ae <+206>: mov $0x2,%edx
+ 8721b3 <+211>: add $0x1,%eax # %eax++
+ 8721b6 <+214>: cmove %edx,%eax # if %eax == 0 then %eax = 2
<original code>
+ 8721bf <+223>: mov %eax,0x30(%rbx) # ct->naccess = %eax
+ 8721c2 <+226>: mov 0x4cfe9f(%rip),%rax # 0xd42068 <catcacheclock>
+ 8721c9 <+233>: mov %rax,0x38(%rbx) # ct->lastaccess = %rax
(D) naively branching then updateing, again.
Come to think of this, I measured the same with a branch again,
specifically: (It showed siginificant degradation before, in my
memory.)
dlsit_move_head(bucket, &ct->cache_elem);
+ if (catalog_cache_prune_min_age < -1) # never be true
+ {
+ (counter update)
+ }
And I had effectively the same numbers from both master and patched.
master 8066.93ms, patched 8052.37ms (-0.18%)
The above branching inserts the same two instructions with (B) into
different place but the result differs, for a reason uncertain to me.
+ 8721bb <+203>: cmpl $0xffffffff,0x4bb886(%rip) # <catalog_cache_prune_min_age>
+ 8721c2 <+210>: jl 0x872208 <SearchCatCache1+280>
I'm not sure why but the patched beats the master by a small
difference. Anyway ths new result shows that compiler might have got
smarter than before?
(E) bumping up in ReleaseCatCache() (won't work)
> Another thought is to bump up the usage counter in ReleaseCatCache(),
> and only when the refcount reaches zero. That might be somewhat
> cheaper, if it's a common pattern to acquire additional leases on an
> entry that's already referenced.
>
> Yet another thought is to replace 'refcount' with an 'acquirecount'
> and 'releasecount'. In SearchCatCacheInternal(), increment
> acquirecount, and in ReleaseCatCache, increment releasecount. When
> they are equal, the entry is not in use. Now you have a counter that
> gets incremented on every access, with the same number of CPU
> instructions in the hot paths as we have today.
These don't work for negative caches, since the corresponding tuples
are never released.
(F) removing less-significant code.
> Or maybe there are some other ways we could micro-optimize
> SearchCatCacheInternal(), to buy back the slowdown that this feature
Yeah, I thought of that in the beginning. (I removed dlist_move_head()
at the time.) But the most difficult aspect of this approach is that
I cannot tell whether the modification never cause degradation or not.
> would add? For example, you could remove the "if (cl->dead) continue;"
> check, if dead entries were kept out of the hash buckets. Or maybe the
> catctup struct could be made slightly smaller somehow, so that it
> would fit more comfortably in a single cache line.
As a trial, I removed that code and added the ct->naccess code.
master 8187.44ms, patched 8266.74ms (+1.0%)
So the removal decreased the degradation by about 3.9% of the total
time.
> My point is that I don't think we want to complicate the code much for
> this. All the indirection stuff seems over-engineered for this. Let's
> find a way to keep it simple.
Yes, agreed from the bottom of my heart. I aspire to find a simple way
to avoid degradation.
regars.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2020-11-06 08:29:58 | Re: Protect syscache from bloating with negative cache entries |
Previous Message | osumi.takamichi@fujitsu.com | 2020-11-06 08:06:14 | RE: extension patch of CREATE OR REPLACE TRIGGER |