From: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de> |
Cc: | vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: refactoring relation extension and BufferAlloc(), faster COPY |
Date: | 2023-02-27 16:06:22 |
Message-ID: | f5d8e7d3-a1a0-e8d8-72b0-a6257b036020@iki.fi |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 21/02/2023 18:33, Alvaro Herrera wrote:
> On 2023-Feb-21, Heikki Linnakangas wrote:
>
>>> +static BlockNumber
>>> +BulkExtendSharedRelationBuffered(Relation rel,
>>> + SMgrRelation smgr,
>>> + bool skip_extension_lock,
>>> + char relpersistence,
>>> + ForkNumber fork, ReadBufferMode mode,
>>> + BufferAccessStrategy strategy,
>>> + uint32 *num_pages,
>>> + uint32 num_locked_pages,
>>> + Buffer *buffers)
>>
>> Ugh, that's a lot of arguments, some are inputs and some are outputs. I
>> don't have any concrete suggestions, but could we simplify this somehow?
>> Needs a comment at least.
>
> Yeah, I noticed this too. I think it would be easy enough to add a new
> struct that can be passed as a pointer, which can be stack-allocated
> by the caller, and which holds the input arguments that are common to
> both functions, as is sensible.
We also do this in freespace.c and visibilitymap.c:
/* Extend as needed. */
while (fsm_nblocks_now < fsm_nblocks)
{
PageSetChecksumInplace((Page) pg.data, fsm_nblocks_now);
smgrextend(reln, FSM_FORKNUM, fsm_nblocks_now,
pg.data, false);
fsm_nblocks_now++;
}
We could use the new smgrzeroextend function here. But it would be
better to go through the buffer cache, because after this, the last
block, at 'fsm_nblocks', will be read with ReadBuffer() and modified.
We could use BulkExtendSharedRelationBuffered() to extend the relation
and keep the last page locked, but the
BulkExtendSharedRelationBuffered() signature doesn't allow that. It can
return the first N pages locked, but there's no way to return the *last*
page locked.
Perhaps we should decompose this function into several function calls.
Something like:
/* get N victim buffers, pinned and !BM_VALID */
buffers = BeginExtendRelation(int npages);
LockRelationForExtension(rel)
/* Insert buffers into buffer table */
first_blk = smgrnblocks()
for (blk = first_blk; blk < last_blk; blk++)
MapNewBuffer(blk, buffers[i])
/* extend the file on disk */
smgrzeroextend();
UnlockRelationForExtension(rel)
for (blk = first_blk; blk < last_blk; blk++)
{
memset(BufferGetPage(buffers[i]), 0,
FinishNewBuffer(buffers[i])
/* optionally lock the buffer */
LockBuffer(buffers[i]);
}
That's a lot more verbose, of course, but gives the callers the
flexibility. And might even be more readable than one function call with
lots of arguments.
This would expose the concept of a buffer that's mapped but marked as
IO-in-progress outside bufmgr.c. On one hand, maybe that's exposing
details that shouldn't be exposed. On the other hand, it might come
handy. Instead of RBM_ZERO_AND_LOCK mode, for example, it might be handy
to have a function that returns an IO-in-progress buffer that you can
initialize any way you want.
- Heikki
From | Date | Subject | |
---|---|---|---|
Next Message | Jeroen Vermeulen | 2023-02-27 16:08:09 | Re: libpq: PQgetCopyData() and allocation overhead |
Previous Message | Robert Haas | 2023-02-27 15:50:42 | Re: pg_dump versus hash partitioning |