Re: smgrextendv and vectorizing the bulk_write implementation

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: smgrextendv and vectorizing the bulk_write implementation
Date: 2024-11-22 21:24:16
Message-ID: 5544a0e0-1341-477a-83cd-c616cd016388@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22/11/2024 19:49, Matthias van de Meent wrote:
> Hi,
>
> While working on the fix for [0] I noticed that bulk_write doens't use
> any of the new vectorized IO features, which seemed like a waste.
> After looking into it a bit deeper, I noticed the opportunity for
> write vectorization was not very high, as one would expect most
> bulk_write IOs to be smgrextend(), which only does page-sized writes.
> That's something that can be solved, though, and so I started this
> patch.

+1

> I've attached two patches to address these two items:
>
> Patch 1/2 reworks smgrextend to smgrextendv, which does mostly the
> same stuff as the current smgrextend, but operates on multiple pages.
> Patch 2/2 updates bulk_write to make use of smgrwritev,
> smgrzeroextend, and the new smgrextendv API, thus reducing the syscall
> burden in processes that use bulk extend APIs.

Seems straightforward.

Thomas wrote patches earlier to do similar thing, see
https://www.postgresql.org/message-id/CA%2BhUKGLx5bLwezZKAYB2O_qHj%3Dov10RpgRVY7e8TSJVE74oVjg%40mail.gmail.com.
I haven't looked closely at the patches to see what the differences are.

> Open question:
> In my version of smgrextendv, I reject any failure to extend by the
> requested size. This is different from smgrwrite, which tries to write
> again when FileWriteV returns a short write. Should smgrextendv do
> retries, too?

Hmm, a short write seems just as possible in smgrextendv() too, it
should retry.

In principle you could get a short write even with a BLCKSZ write. We've
always just assumed that it won't happen, or if it does it means you ran
out of disk space. I don't know why we ever assumed that, even though it
has worked in practice. But I think we should stop assuming that going
forward, and always retry short writes.

--
Heikki Linnakangas
Neon (https://neon.tech)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2024-11-22 21:32:31 Re: On non-Windows, hard depend on uselocale(3)
Previous Message Nathan Bossart 2024-11-22 21:10:33 Re: SIMD optimization for list_sort