From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, David Christensen <david(dot)christensen(at)crunchydata(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net> |
Subject: | Re: Initdb-time block size specification |
Date: | 2023-06-30 22:58:20 |
Message-ID: | ZJ9eDKleZY0Gk7yd@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 30, 2023 at 03:51:18PM -0700, Andres Freund wrote:
> > For a 4kB write, to say it is not partially written would be to require
> > the operating system to guarantee that the 4kB write is not split into
> > smaller writes which might each be atomic because smaller atomic writes
> > would not help us.
>
> That's why were talking about drives with 4k sector size - you *can't* split
> the writes below that.
Okay, good point.
> The problem is that, as far as I know,it's not always obvious what block size
> is being used on the actual storage level. It's not even trivial when
> operating on a filesystem directly stored on a single block device ([1]). Once
> there's things like LVM or disk encryption involved, it gets pretty hairy
> ([2]). Once you know all the block devices, it's not too bad, but ...
>
> Greetings,
>
> Andres Freund
>
> [1] On linux I think you need to use stat() to figure out the st_dev for a
> file, then look in /proc/self/mountinfo for the block device, use the name
> of the file to look in /sys/block/$d/queue/physical_block_size.
I just got a new server:
https://momjian.us/main/blogs/blog/2023.html#June_28_2023
so tested this on my new M.2 NVME storage device:
$ /sys/block/nvme0n1/queue/physical_block_size
262144
that's 256k, not 4k.
> [2] The above doesn't work because e.g. a device mapper target might only
> support 4k sectors, even though the sectors on the underlying storage device
> are 512b sectors. E.g. my root filesystem is encrypted, and if you follow the
> above recipe (with the added step of resolving the symlink to know the actual
> device name), you would see a 4k sector size. Even though the underlying NVMe
> disk only supports 512b sectors.
Good point.
--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2023-06-30 22:59:09 | Re: Initdb-time block size specification |
Previous Message | Tomas Vondra | 2023-06-30 22:56:13 | Re: Initdb-time block size specification |