Quick Links

Re: Vectored I/O in bulk_write.c

From:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Vectored I/O in bulk_write.c
Date:	2024-03-13 10:18:33
Message-ID:	CA+hUKGKsP+xeRm0TGmVrQ4nPK6CJ=CDGPXH3e5FaffQXZNpNTg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Mar 13, 2024 at 9:57 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> Let's bite the bullet and merge the smgrwrite and smgrextend functions
> at the smgr level too. I propose the following signature:
>
> #define SWF_SKIP_FSYNC 0x01
> #define SWF_EXTEND 0x02
> #define SWF_ZERO 0x04
>
> void smgrwritev(SMgrRelation reln, ForkNumber forknum,
> BlockNumber blocknum,
> const void **buffer, BlockNumber nblocks,
> int flags);
>
> This would replace smgwrite, smgrextend, and smgrzeroextend. The

That sounds pretty good to me.

> > Here also is a first attempt at improving the memory allocation and
> > memory layout.
> > ...
> > +typedef union BufferSlot
> > +{
> > + PGIOAlignedBlock buffer;
> > + dlist_node freelist_node;
> > +} BufferSlot;
> > +
>
> If you allocated the buffers in one large contiguous chunk, you could
> often do one large write() instead of a gathered writev() of multiple
> blocks. That should be even better, although I don't know much of a
> difference it makes. The above layout wastes a fair amount memory too,
> because 'buffer' is I/O aligned.

The patch I posted has an array of buffers with the properties you
describe, so you get a pwrite() (no 'v') sometimes, and a pwritev()
with a small iovcnt when it wraps around:

pwrite(...) = 131072 (0x20000)
pwritev(...,3,...) = 131072 (0x20000)
pwrite(...) = 131072 (0x20000)
pwritev(...,3,...) = 131072 (0x20000)
pwrite(...) = 131072 (0x20000)

Hmm, I expected pwrite() alternating with pwritev(iovcnt=2), the
latter for when it wraps around the buffer array, so I'm not sure why it's
3. I guess the btree code isn't writing them strictly monotonically or
something...

I don't believe it wastes any memory on padding (except a few bytes
wasted by palloc_aligned() before BulkWriteState):

(gdb) p &bulkstate->buffer_slots[0]
$4 = (BufferSlot *) 0x15c731cb4000
(gdb) p &bulkstate->buffer_slots[1]
$5 = (BufferSlot *) 0x15c731cb6000
(gdb) p sizeof(bulkstate->buffer_slots[0])
$6 = 8192

In response to

Re: Vectored I/O in bulk_write.c at 2024-03-13 08:57:17 from Heikki Linnakangas

Responses

Re: Vectored I/O in bulk_write.c at 2024-03-13 10:22:20 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Heikki Linnakangas	2024-03-13 10:22:20	Re: Vectored I/O in bulk_write.c
Previous Message	Jelte Fennema-Nio	2024-03-13 10:04:43	Re: [EXTERNAL] Re: Add non-blocking version of PQcancel