Quick Links

Re: [PATCH] Speedup truncates of relation forks

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Jamison, Kirk" <k(dot)jamison(at)jp(dot)fujitsu(dot)com>
Cc:	"Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [PATCH] Speedup truncates of relation forks
Date:	2019-06-13 11:01:22
Message-ID:	CAD21AoDjFW7vOYbrw7S5VOv6dKqdcwViCpBcqmzVKof7Pv1B6w@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Jun 13, 2019 at 6:30 PM Jamison, Kirk <k(dot)jamison(at)jp(dot)fujitsu(dot)com> wrote:
>
> On Wednesday, June 12, 2019 4:29 PM (GMT+9), Masahiko Sawada wrote:
> > On Wed, Jun 12, 2019 at 12:25 PM Tsunakawa, Takayuki
> > <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> wrote:
> > >
> > > From: Tomas Vondra [mailto:tomas(dot)vondra(at)2ndquadrant(dot)com]
> > > > Years ago I've implemented an optimization for many DROP TABLE
> > > > commands in a single transaction - instead of scanning buffers for
> > > > each relation, the code now accumulates a small number of relations
> > > > into an array, and then does a bsearch for each buffer.
> > > >
> > > > Would something like that be applicable/useful here? That is, if we
> > > > do multiple TRUNCATE commands in a single transaction, can we
> > > > optimize it like this?
> > >
> > > Unfortunately not. VACUUM and autovacuum handles each table in a different
> > transaction.
> >
> > We do RelationTruncate() also when we truncate heaps that are created in the
> > current transactions or has a new relfilenodes in the current transaction.
> > So I think there is a room for optimization Thomas suggested, although I'm
> > not sure it's a popular use case.
>
> I couldn't think of a use case too.
>
> > I've not look at this patch deeply but in DropRelFileNodeBuffer I think we
> > can get the min value of all firstDelBlock and use it as the lower bound of
> > block number that we're interested in. That way we can skip checking the array
> > during scanning the buffer pool.
>
> I'll take note of this suggestion.
> Could you help me expound more on this idea, skipping the internal loop by
> comparing the min and buffer descriptor (bufHdr)?
>

Yes. For example,

BlockNumber minBlock = InvalidBlockNumber;
(snip)
/* Get lower bound block number we're interested in */
for (i = 0; i < nforks; i++)
{
if (!BlockNumberIsValid(minBlock) ||
minBlock > firstDelBlock[i])
minBlock = firstDelBlock[i];
}

for (i = 0; i < NBuffers; i++)
{
(snip)
buf_state = LockBufHdr(bufHdr);

/* check with the lower bound and skip the loop */
if (bufHdr->tag.blockNum < minBlock)
{
UnlockBufHdr(bufHdr, buf_state);
continue;
}

for (k = 0; k < nforks; k++)
{
if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
bufHdr->tag.forkNum == forkNum[k] &&
bufHdr->tag.blockNum >= firstDelBlock[k])

But since we acquire the buffer header lock after all and the number
of the internal loops is small (at most 3 for now) the benefit will
not be big.

> In the current patch, I've implemented the following in DropRelFileNodeBuffers:
> for (i = 0; i < NBuffers; i++)
> {
> ...
> buf_state = LockBufHdr(bufHdr);
> for (k = 0; k < nforks; k++)
> {
> if (RelFileNodeEquals(bufHdr->tag.rnode, rnode.node) &&
> bufHdr->tag.forkNum == forkNum[k] &&
> bufHdr->tag.blockNum >= firstDelBlock[k])
> {
> InvalidateBuffer(bufHdr); /* releases spinlock */
> break;
> }
>
> > Don't we use each elements of nblocks for each fork? That is, each fork uses
> > an element at its fork number in the nblocks array and sets InvalidBlockNumber
> > for invalid slots, instead of passing the valid number of elements. That way
> > the following code that exist at many places,
> >
> > blocks[nforks] = visibilitymap_mark_truncate(rel, nblocks);
> > if (BlockNumberIsValid(blocks[nforks]))
> > {
> > forks[nforks] = VISIBILITYMAP_FORKNUM;
> > nforks++;
> > }
> >
> > would become
> >
> > blocks[VISIBILITYMAP_FORKNUM] = visibilitymap_mark_truncate(rel,
> > nblocks);
>
> In the patch, we want to truncate all forks' blocks simultaneously, so
> we optimize the invalidation of buffers and reduce the number of loops
> using those values.
> The suggestion above would have to remove the forks array and its
> forksize (nforks), is it correct? But I think we’d need the fork array
> and nforks to execute the truncation all at once.

I meant that each forks can use the its forknumber'th element of
firstDelBlock[]. For example, if firstDelBlock = {1000,
InvalidBlockNumber, 20, InvalidBlockNumber}, we can invalid buffers
pertaining both greater than block number 1000 of main and greater
than block number 20 of vm. Since firstDelBlock[FSM_FORKNUM] ==
InvalidBlockNumber we don't invalid buffers of fsm.

As Tsunakawa-san mentioned, since your approach would reduce the loop
count your idea might be better than mine which always takes 4 loop
counts.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

RE: [PATCH] Speedup truncates of relation forks at 2019-06-13 09:30:00 from Jamison, Kirk

Responses

RE: [PATCH] Speedup truncates of relation forks at 2019-06-14 00:10:07 from Tsunakawa, Takayuki
RE: [PATCH] Speedup truncates of relation forks at 2019-06-14 01:27:10 from Jamison, Kirk

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	shohei.mochizuki	2019-06-13 11:16:13	RE: BEFORE UPDATE trigger on postgres_fdw table not work
Previous Message	Richard Guo	2019-06-13 10:24:04	Re: Parallel grouping sets