Re: Draft for basic NUMA observability

From: Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>
To: Tomas Vondra <tomas(at)vondra(dot)me>
Cc: Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Draft for basic NUMA observability
Date: 2025-04-03 12:42:32
Message-ID: CAKZiRmw4OYiKEFXSAV6jt11SZOU0q1j4CKsZYcDH3qV1_+tN8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 3, 2025 at 2:15 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:

> Ah, OK. Jakub, can you correct (and double-check) this in the next
> version of the patch?

Done.

> > About v21-0002:
> >
> > === 1
> >
> > I can see that the pg_buffercache_init_entries() helper comments are added in
> > v21-0003 but I think that it would be better to add them in v21-0002
> > (where the helper is actually created).
> >
>
> +1 to that

Done.

> > About v21-0003:
> >
> > === 2
> >
> >> I hear you, attached v21 / 0003 is free of float/double arithmetics
> >> and uses non-float point values.
> >
> > + if (buffers_per_page > 1)
> > + os_page_query_count = NBuffers;
> > + else
> > + os_page_query_count = NBuffers * pages_per_buffer;
> >
> > yeah, that's more elegant. I think that it properly handles the relationships
> > between buffer and page sizes without relying on float arithmetic.
> >
>
> In the review I just submitted, I changed this to
>
> os_page_query_count = Max(NBuffers, NBuffers * pages_per_buffer);
>
> but maybe it's less clear. Feel free to undo my change.

Cool, thanks, will send shortly v23 with this applied.

>
> > === 3
> >
> > + if (buffers_per_page > 1)
> > + {
> >
> > As buffers_per_page does not change, I think I'd put this check outside of the
> > for (i = 0; i < NBuffers; i++) loop, something like:
> >
> > "
> > if (buffers_per_page > 1) {
> > /* BLCKSZ < PAGESIZE: one page hosts many Buffers */
> > for (i = 0; i < NBuffers; i++) {
> > .
> > .
> > .
> > .
> > } else {
> > /* BLCKSZ >= PAGESIZE: Buffer occupies more than one OS page */
> > for (i = 0; i < NBuffers; i++) {
> > .
> > .
>
> I don't know. It's a matter of opinion, but I find the current code
> fairly understandable. Maybe if it meaningfully reduced the code
> nesting, but even with the extra branch we'd still need the for loop.
> I'm not against doing this differently, but I'd have to see how that
> looks. Until then I think it's fine to have the code as is.

v23 will have incorporated Bertrand's idea soon. No hard feelings, but
it's kind of painful to switch like that ;)

> > === 4
> >
> >> That _numa_prepare_ptrs() is unused and will need to be removed,
> >> but we can still move some code there if necessary.
> >
> > Yeah I think that it can be simply removed then.
> >
>
> I'm not particularly attached on having the _ptrs() function, but why
> couldn't it build the os_page_ptrs array as before?

I've removed it in v23, the code for me just didn't have flow...

> > === 5
> >
> >> @Bertrand: do you have anything against pg_shm_allocations_numa
> >> instead of pg_shm_numa_allocations? I don't mind changing it...
> >
> > I think that pg_shm_allocations_numa() is better (given the examples you just
> > shared).

Done.

> > === 6
> >
> >> I find all of those non-user friendly and I'm afraid I won't be able
> >> to pull that alone in time...
> >
> > Maybe we could add a few words in the doc to mention the "multiple nodes per
> > buffer" case? And try to improve it for say 19? Also maybe we should just focus
> > till v21-0003 (and discard v21-0004 for 18).
> >
>
> IMHO it's not enough to paper this over by mentioning it in the docs.

OK

> I'm a bit confused about which patch you suggest to leave out. I think
> 0004 is pg_shmem_allocations_numa, but I think that part is fine, no?
> We've been discussing how to represent nodes for buffers in 0003.

IMHO 0001 + 0004 is good. 0003 is probably the last troublemaker, but
we settled on arrays right?

> I understand it'd require a bit of coding, but AFAICS adding an array
> would be fairly trivial amount of code. Something like
>
> values[i++]
> = PointerGetDatum(construct_array_builtin(nodes, nodecnt, INT4OID));
>
> would do, I think. But if we decide to do the 3-column view, that'd be
> even simpler, I think. AFAICS it means we could mostly ditch the
> pg_buffercache refactoring in patch 0002, and 0003 would get much
> simpler too I think.
>
> If Jakub doesn't have time to work on this, I can take a stab at it
> later today.

I won't be able to even start on this today, so if you have cycles for
please do so...

-J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2025-04-03 13:08:48 Re: libpq support for NegotiateProtocolVersion
Previous Message Bertrand Drouvot 2025-04-03 12:41:48 Re: Draft for basic NUMA observability