Re: Initdb-time block size specification

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, David Christensen <david(dot)christensen(at)crunchydata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Initdb-time block size specification
Date: 2023-06-30 22:58:20
Message-ID: ZJ9eDKleZY0Gk7yd@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 30, 2023 at 03:51:18PM -0700, Andres Freund wrote:
> > For a 4kB write, to say it is not partially written would be to require
> > the operating system to guarantee that the 4kB write is not split into
> > smaller writes which might each be atomic because smaller atomic writes
> > would not help us.
>
> That's why were talking about drives with 4k sector size - you *can't* split
> the writes below that.

Okay, good point.

> The problem is that, as far as I know,it's not always obvious what block size
> is being used on the actual storage level. It's not even trivial when
> operating on a filesystem directly stored on a single block device ([1]). Once
> there's things like LVM or disk encryption involved, it gets pretty hairy
> ([2]). Once you know all the block devices, it's not too bad, but ...
>
> Greetings,
>
> Andres Freund
>
> [1] On linux I think you need to use stat() to figure out the st_dev for a
> file, then look in /proc/self/mountinfo for the block device, use the name
> of the file to look in /sys/block/$d/queue/physical_block_size.

I just got a new server:

https://momjian.us/main/blogs/blog/2023.html#June_28_2023

so tested this on my new M.2 NVME storage device:

$ /sys/block/nvme0n1/queue/physical_block_size
262144

that's 256k, not 4k.

> [2] The above doesn't work because e.g. a device mapper target might only
> support 4k sectors, even though the sectors on the underlying storage device
> are 512b sectors. E.g. my root filesystem is encrypted, and if you follow the
> above recipe (with the added step of resolving the symlink to know the actual
> device name), you would see a 4k sector size. Even though the underlying NVMe
> disk only supports 512b sectors.

Good point.

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

Only you can decide what is important to you.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-06-30 22:59:09 Re: Initdb-time block size specification
Previous Message Tomas Vondra 2023-06-30 22:56:13 Re: Initdb-time block size specification