Re: Large block sizes support in Linux

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Pankaj Raghav <kernel(at)pankajraghav(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, mcgrof(at)kernel(dot)org, gost(dot)dev(at)samsung(dot)com
Subject: Re: Large block sizes support in Linux
Date: 2024-03-25 20:19:00
Message-ID: ZgHcNGJVfE7-UkAG@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 25, 2024 at 02:53:56PM +0100, Pankaj Raghav wrote:
> This is an excellent question that needs a bit of community discussion to
> expose a device agnostic value that userspace can trust.
>
> There might be a talk this year at LSFMM about untorn writes[1] in buffered IO
> path. I will make sure to bring this question up.
>
> At the moment, Linux exposes the physical blocksize by taking also atomic guarantees
> into the picture, especially for NVMe it uses the NAWUPF and AWUPF while setting
> physical blocksize (/sys/block/<dev>/queue/physical_block_size).
>
> A system admin could use value exposed by phy_bs as a hint to disable full_page_write=off.
> Of course this requires also the device to give atomic guarantees.
>
> The most optimal would be DB page size == FS block size == Device atomic size.

One other thing I remember is that some people modified the ZFS file
system parameters enough that they made Postgres non-durable and
corrupted their database. This is a very hard thing to get right
because the user has very little feedback when they break things.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2024-03-25 20:20:42 Re: pgsql: Track last_inactive_time in pg_replication_slots.
Previous Message Nathan Bossart 2024-03-25 20:05:51 Re: Popcount optimization using AVX512