From: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <hlinnaka(at)iki(dot)fi> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Vectored I/O in bulk_write.c |
Date: | 2024-03-13 10:18:33 |
Message-ID: | CA+hUKGKsP+xeRm0TGmVrQ4nPK6CJ=CDGPXH3e5FaffQXZNpNTg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 13, 2024 at 9:57 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> Let's bite the bullet and merge the smgrwrite and smgrextend functions
> at the smgr level too. I propose the following signature:
>
> #define SWF_SKIP_FSYNC 0x01
> #define SWF_EXTEND 0x02
> #define SWF_ZERO 0x04
>
> void smgrwritev(SMgrRelation reln, ForkNumber forknum,
> BlockNumber blocknum,
> const void **buffer, BlockNumber nblocks,
> int flags);
>
> This would replace smgwrite, smgrextend, and smgrzeroextend. The
That sounds pretty good to me.
> > Here also is a first attempt at improving the memory allocation and
> > memory layout.
> > ...
> > +typedef union BufferSlot
> > +{
> > + PGIOAlignedBlock buffer;
> > + dlist_node freelist_node;
> > +} BufferSlot;
> > +
>
> If you allocated the buffers in one large contiguous chunk, you could
> often do one large write() instead of a gathered writev() of multiple
> blocks. That should be even better, although I don't know much of a
> difference it makes. The above layout wastes a fair amount memory too,
> because 'buffer' is I/O aligned.
The patch I posted has an array of buffers with the properties you
describe, so you get a pwrite() (no 'v') sometimes, and a pwritev()
with a small iovcnt when it wraps around:
pwrite(...) = 131072 (0x20000)
pwritev(...,3,...) = 131072 (0x20000)
pwrite(...) = 131072 (0x20000)
pwritev(...,3,...) = 131072 (0x20000)
pwrite(...) = 131072 (0x20000)
Hmm, I expected pwrite() alternating with pwritev(iovcnt=2), the
latter for when it wraps around the buffer array, so I'm not sure why it's
3. I guess the btree code isn't writing them strictly monotonically or
something...
I don't believe it wastes any memory on padding (except a few bytes
wasted by palloc_aligned() before BulkWriteState):
(gdb) p &bulkstate->buffer_slots[0]
$4 = (BufferSlot *) 0x15c731cb4000
(gdb) p &bulkstate->buffer_slots[1]
$5 = (BufferSlot *) 0x15c731cb6000
(gdb) p sizeof(bulkstate->buffer_slots[0])
$6 = 8192
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2024-03-13 10:22:20 | Re: Vectored I/O in bulk_write.c |
Previous Message | Jelte Fennema-Nio | 2024-03-13 10:04:43 | Re: [EXTERNAL] Re: Add non-blocking version of PQcancel |