RE: Draft for basic NUMA observability

From: "Shinoda, Noriyoshi (SXD Japan FSI)" <noriyoshi(dot)shinoda(at)hpe(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>, Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Nazir Bilal Yavuz" <byavuz81(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Draft for basic NUMA observability
Date: 2025-04-07 23:26:26
Message-ID: DM4PR84MB1734308EB741A6ECFF040C27EEAA2@DM4PR84MB1734.NAMPRD84.PROD.OUTLOOK.COM
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thanks for developing this great feature.
The manual says that the 'size' column of the pg_shmem_allocations_numa view is 'int4', but the implementation is 'int8'.
The attached small patch fixes the manual.

Regards,
Noriyoshi Shinoda

-----Original Message-----
From: Tomas Vondra <tomas(at)vondra(dot)me>
Sent: Tuesday, April 8, 2025 6:59 AM
To: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>; Andres Freund <andres(at)anarazel(dot)de>; Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>; Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>; PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Draft for basic NUMA observability

On 4/7/25 23:50, Jakub Wartak wrote:
> On Mon, Apr 7, 2025 at 11:27 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
>>
>> Hi,
>>
>> I've pushed all three parts of v29, with some additional corrections
>> (picked lower OIDs, bumped catversion, fixed commit messages).
>
> Hi Tomas, great, awesome! (this is an awesome feeling)! Thank You for
> such incredible support on the last mile of this and also to Bertrand
> (for persistence!), Andres and Alvaro for lots of babysitting.
>

Glad I could help, thanks for the patch.

>> AFAIK v29 fixed this, the end pointer calculations were wrong. With
>> that it passed for me with/without THP, different blocks sizes etc.
>
> Yeah, that was a typo, I've started writing about v28, but then in the
> middle of that v29 landed and I still was chasing that finding, I've
> just forgotten to bump this.
>
>> We don't align buffers to os_page_size, we align them
>> PG_IO_ALIGN_SIZE, which is 4kB or so. And it's determined at compile
>> time, while THP is determined when starting the cluster.
> [..]
>> Right, this is because that's where the THP boundary happens to be.
>> And that one "duplicate" entry is for a buffer that happens to span
>> two pages. This is *exactly* the misalignment of blocks and pages
>> that I was wondering about earlier, and with the fixed endptr
>> calculation we handle that just fine.
>>
>> No opinion on the aligment - maybe we should do that, but it's not
>> something this patch needs to worry about.
>
> Agreed.I was wondering even if there are other drawbacks of the
> situation, but other than not reporting duplicates here in this
> pg_buffercache view, I cannot identify anything worthwhile.
>

Well, the drawback is that accessing the buffer may require hitting two different NUMA nodes. I'm not 100% sure it can actually happen, though.
the buffer should be initialized as a whole, so it should got to the same node. But maybe it could be "split" by THP migration, or something like that.

In any case, that's not caused by this patch, and it's less serious with huge pages - it's only affect buffers on the boundaries. But with the small 4K pages it can happen for *every* buffer.

regards

--
Tomas Vondra

Attachment Content-Type Size
pg_shmem_allocations_numa_doc_v1.diff application/octet-stream 577 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-04-07 23:29:24 Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
Previous Message Bruce Momjian 2025-04-07 23:18:56 Re: [PATCH] Automatic client certificate selection support for libpq v1