RE: [Patch] Optimize dropping of relation buffers using dlist

From: "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com>
To: 'Kyotaro Horiguchi' <horikyota(dot)ntt(at)gmail(dot)com>, "tsunakawa(dot)takay(at)fujitsu(dot)com" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: "amit(dot)kapila16(at)gmail(dot)com" <amit(dot)kapila16(at)gmail(dot)com>, "tgl(at)sss(dot)pgh(dot)pa(dot)us" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "andres(at)anarazel(dot)de" <andres(at)anarazel(dot)de>, "robertmhaas(at)gmail(dot)com" <robertmhaas(at)gmail(dot)com>, "tomas(dot)vondra(at)2ndquadrant(dot)com" <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: [Patch] Optimize dropping of relation buffers using dlist
Date: 2020-10-12 09:38:12
Message-ID: OSBPR01MB2341641A3F3782339D4CDB53EF070@OSBPR01MB2341.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, October 9, 2020 11:12 AM, Horiguchi-san wrote:
> I have some comments on the latest patch.

Thank you for the feedback!
I've attached the latest patches.

> visibilitymap_prepare_truncate(Relation rel, BlockNumber nheapblocks) {
> BlockNumber newnblocks;
> + bool cached;
>
> All the added variables added by 0002 is useless because all the caller sites
> are not interested in the value. smgrnblocks should accept NULL as isCached.
> (I'm agree with Tsunakawa-san that the camel-case name is not common
> there.)
>
> + nForkBlocks[i] = smgrnblocks(smgr_reln, forkNum[i],
> &isCached);
> +
> + if (!isCached)
>
> "is cached" is not the property that code is interested in. No other callers to
> smgrnblocks are interested in that property. The need for caching is purely
> internal of smgrnblocks().
> On the other hand, we are going to utilize the property of "accuracy"
> that is a biproduct of reducing fseek calls, and, again, not interested in how it
> is achieved.
> So I suggest that the name should be "accurite" or something that is not
> suggest the mechanism used under the hood.

I changed the bool param to "accurate" per your suggestion.
And I also removed the additional variables "bool cached" from the modified functions.
Now NULL values are accepted for the new boolean parameter

> + if (nTotalBlocks != InvalidBlockNumber &&
> + nBlocksToInvalidate <
> BUF_DROP_FULL_SCAN_THRESHOLD)
>
> I don't think nTotalBlocks is useful. What we need here is only total blocks for
> every forks (nForkBlocks[]) and the total number of buffers to be invalidated
> for all forks (nBlocksToInvalidate).

Alright. I also removed nTotalBlocks in v24-0003 patch.

for (i = 0; i < nforks; i++)
{
if (nForkBlocks[i] != InvalidBlockNumber &&
nBlocksToInvalidate < BUF_DROP_FULL_SCAN_THRESHOLD)
{
Optimization loop
}
else
break;
}
if (i >= nforks)
return;
{ usual buffer invalidation process }

> > > > The right side of >= should be cur_block.
> > > Fixed.
> > >= should be =, shouldn't it?
>
> It's just from a paranoia. What we are going to invalidate is blocks blockNum
> of which >= curBlock. Although actually there's no chance of any other
> processes having replaced the buffer with another page (with lower blockid)
> of the same relation after BufTableLookup(), that condition makes it sure not
> to leave blocks to be invalidated left alone.
> Sorry. What we are going to invalidate is blocks that are blocNum >=
> firstDelBlock[i]. So what I wanted to suggest was the condition should be
>
> + if (RelFileNodeEquals(bufHdr->tag.rnode,
> rnode.node) &&
> + bufHdr->tag.forkNum ==
> forkNum[j] &&
> + bufHdr->tag.blockNum >=
> firstDelBlock[j])

I used bufHdr->tag.blockNum >= firstDelBlock[i] in the latest patch.

> > Please measure and let us see just the recovery performance again because
> the critical part of the patch was modified. If the performance is good as the
> previous one, and there's no review interaction with others in progress, I'll
> mark the patch as ready for committer in a few days.
>
> The performance is expected to be kept since smgrnblocks() is called in a
> non-hot code path and actually it is called at most four times per a buffer
> drop in this patch. But it's better making it sure.

Hmm. When I repeated the performance measurement for non-recovery,
I got almost similar execution results for both master and patched.

Execution Time (in seconds)
| s_b | master | patched | %reg |
|-------|--------|---------|--------|
| 128MB | 15.265 | 14.769 | -3.36% |
| 1GB | 14.808 | 14.618 | -1.30% |
| 20GB | 24.673 | 24.425 | -1.02% |
| 100GB | 74.298 | 74.813 | 0.69% |

That is considering that I removed the recovery-related checks in the patch and just
executed the commands on a standalone server.
- if (InRecovery && reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)
+ if (reln->smgr_cached_nblocks[forknum] != InvalidBlockNumber)

OTOH, I also measured the recovery performance by having hot standby and executing failover.
The results were good and almost similar to the previously reported recovery performance.

Recovery Time (in seconds)
| s_b | master | patched | %reg |
|-------|--------|---------|--------|
| 128MB | 3.043 | 2.977 | -2.22% |
| 1GB | 3.417 | 3.41 | -0.21% |
| 20GB | 20.597 | 2.409 | -755% |
| 100GB | 66.862 | 2.409 | -2676% |

For 20GB s_b, from 20.597 s (Master) to 2.409 s (Patched).
For 100GB s_b, from 66.862 s (Master) to 2.409 s (Patched).
This is mainly benefits for large shared_buffers setting,
without compromising when shared_buffers is set to default or lower value.

If you could take a look again and if you have additional feedback or comments, I'd appreciate it.
Thank you for your time

Regards,
Kirk Jamison

Attachment Content-Type Size
v24-0001-Prevent-invalidating-blocks-in-smgrextend-during.patch application/octet-stream 1.1 KB
v24-0002-Add-bool-param-in-smgrnblocks-for-cached-blocks.patch application/octet-stream 8.7 KB
v24-0003-Optimize-DropRelFileNodeBuffers-during-recovery.patch application/octet-stream 7.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2020-10-12 09:46:16 Re: speed up unicode normalization quick check
Previous Message Shinoda, Noriyoshi (PN Japan A&PS Delivery) 2020-10-12 09:29:05 RE: Resetting spilled txn statistics in pg_stat_replication