Re: WIP: Avoid creation of the free space map for small tables

From: John Naylor <jcnaylor(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Avoid creation of the free space map for small tables
Date: 2018-10-15 10:39:34
Message-ID: CAJVSVGUrEaKxMLcDNnghxg+j=S3G+wgqjyT020f8tv+iUYDPEA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/15/18, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Few comments on your latest patch:
> -
> +static bool
> +allow_write_to_fsm(Relation rel, BlockNumber heapBlk)
> +{
> + BlockNumber heap_nblocks;
> +
> + if (heapBlk > HEAP_FSM_EXTENSION_THRESHOLD ||
> + rel->rd_rel->relkind != RELKIND_RELATION)
> + return true;
> +
> + /* XXX is this value cached? */
> + heap_nblocks = RelationGetNumberOfBlocks(rel);
> +
> + if (heap_nblocks > HEAP_FSM_EXTENSION_THRESHOLD)
> + return true;
> + else
> + {
> + RelationOpenSmgr(rel);
> + return smgrexists(rel->rd_smgr, FSM_FORKNUM);
> + }
> +}
>
> I think you can avoid calling RelationGetNumberOfBlocks, if you call
> smgrexists before

Okay, I didn't know which was cheaper, but I'll check smgrexists
first. Thanks for the info.

> and for the purpose of vacuum, we can get that as an
> input parameter. I think one can argue for not changing the interface
> functions like RecordPageWithFreeSpace to avoid calling
> RelationGetNumberOfBlocks, but to me, it appears worth to save the
> additional system call.

I agree, and that should be fairly straightforward. I'll include that
in the next patch.

> -
> targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);
> -
> - /*
> - * If the FSM knows nothing of the rel, try the last page before we
> - * give up and extend. This avoids one-tuple-per-page syndrome during
> - * bootstrapping or in a recently-started system.
> - */
> if (targetBlock == InvalidBlockNumber)
> - {
> - BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
> -
> - if (nblocks > 0)
> - targetBlock = nblocks - 1;
> - }
> + targetBlock = get_page_no_fsm(relation, InvalidBlockNumber,
> + &try_every_page);
>
>
> Is it possible to hide the magic of trying each block within
> GetPageWithFreeSpace? It will simplify the code and in future, if
> another storage API has a different function for
> RelationGetBufferForTuple, it will work seamlessly, provided they are
> using same FSM. One such user is zheap.

Hmm, here I'm a bit more skeptical about the trade offs. That would
mean, in effect, to put a function called get_page_no_fsm() in the FSM
code. ;-) I'm willing to be convinced otherwise, of course, but
here's my reasoning:

For one, we'd have to pass prevBlockNumber and &try_every_block to
GetPageWithFreeSpace() (and RecordAndGetPageWithFreeSpace() by
extension), which are irrelevant to some callers. In addition, in
hio.c, there is a call where we don't want to try any blocks that we
have already, much less all of them:

/*
* Check if some other backend has extended a block for us while
* we were waiting on the lock.
*/
targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);

By the time we get to this call, we likely wouldn't trigger the logic
to try every block, but I don't think we can guarantee that. We could
add a boolean parameter that means "consider trying every block", but
I don't think the FSM code should have so much state passed to it.

Thanks for reviewing,
-John Naylor

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2018-10-15 11:02:17 Refactoring the checkpointer's fsync request queue
Previous Message Amit Langote 2018-10-15 10:34:56 Re: Speeding up INSERTs and UPDATEs to partitioned tables