Re: Improve monitoring of shared memory allocations

From: Tomas Vondra <tomas(at)vondra(dot)me>
To: Rahila Syed <rahilasyed90(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Improve monitoring of shared memory allocations
Date: 2025-03-24 13:24:17
Message-ID: 3e40eeec-d8bf-4496-854e-485dd901f6a2@vondra.me
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I did a review on v3 of the patch. I see there's been some minor changes
in v4 - I haven't tried to adjust my review, but I assume most of my
comments still apply.

Most of my suggestions are about formatting and/or readability. Some of
the likes (e.g. the pointer arithmetics) got so long pgindent would have
to wrap them, but more importantly really hard to decipher what it does.

I've added a couple "review" commits, actually doing most of what I'm
going to suggest.

1) I don't quite understand why hash_get_shared_size() got renamed to
hash_get_init_size()? Why is that needed? Does it cover only some
initial allocation, and there's additional memory allocated later? And
what's the point of the added const?

2) I find it more readable when ShmemInitStruct() is formatted on
multiple lines. Yes, it's a matter of choice, but also other places do
it this way, I think.

3) The changes in hash_create() are a good example of readability issues
I already mentioned. Consider this line:

curr_offset = (((char *) hashp->hctl) + sizeof(HASHHDR) + (hctl->dsize *
sizeof(HASHSEGMENT)) + (sizeof(HASHBUCKET) * hctl->ssize * nsegs));

First, I'd point out this is not an offset but a pointer, so the
variable name is a bit misleading. But more importantly, I envy those
who can parse this in their head and see if it's correct.

I think it's much better to define a couple macros to calculate parts of
this, essentially a tiny "language" expressing this in a concise way.
The 0002 patch defines

- HASH_ELEMENTS
- HASH_ELEMENT_NEXT
- HASH_SEGMENT_PTR
- HASH_SEGMENT_SIZE
- ...

and then uses that in hash_create(). I believe it's way more readable
like this.

4) I find it a bit strange when a function calculates multiple values,
and then returns them in different ways - one as a return value, one
through a pointer, the way compute_buckets_and_segs() did.

There are cases when it makes sense (e.g. when one of the values is
returned only optionally), but otherwise I think it's better to return
both values in the same way through a pointer. The 0002 patch adjusts
compute_buckets_and_segs() to it like this.

We have a precedent for this - ExecChooseHashTableSize(), which is doing
a very similar thing for sizing hashjoin hash table.

5) Isn't it wrong that PredicateLockShmemInit() removes the ShmemAlloc()
along with calculating the per-element requestSize, but then still does

memset(PredXact->element, 0, requestSize);

and

memset(RWConflictPool->element, 0, requestSize);

with requestSize for the whole allocation? I haven't seen any crashes,
but this seems wrong to me. I believe we can simply zero the whole
allocation right after ShmemInitStruct(), there's no need to do this for
each individual element.

5) InitProcGlobal is another example of hard-to-read code. Admittedly,
it wasn't particularly readable before, but making the lines even longer
does not make it much better ...

I guess adding a couple macros similar to hash_create() would be one way
to improve this (and there's even a review comment in that sense), but I
chose to just do maintain a simple pointer. But if you think it should
be done differently, feel free to do so.

6) I moved the PGProcShmemSize() to be right after ProcGlobalShmemSize()
which seems to be doing a very similar thing, and to not require the
prototype. Minor detail, feel free to undo.

7) I think the PG_CACHE_LINE_SIZE is entirely unrelated to the rest of
the patch, and I see no reason to do it in the same commit. So 0005
removes this change, and 0006 reintroduces it separately.

FWIW I see no justification for why the cache line padding is useful,
and it seems quite unlikely this would benefit anything, consiering it
adds padding between fairly long arrays. What kind of contention can we
get there? I don't get it.

Also, why is the patch adding padding after statusFlags (the last array
allocated in InitProcGlobal) and not between allProcs and xids?

regards

--
Tomas Vondra

Attachment Content-Type Size
v20250324-0001-Account-for-initial-shared-memory-allocate.patch text/x-patch 13.2 KB
v20250324-0002-review.patch text/x-patch 10.3 KB
v20250324-0003-Replace-ShmemAlloc-calls-by-ShmemInitStruc.patch text/x-patch 6.4 KB
v20250324-0004-review.patch text/x-patch 7.5 KB
v20250324-0005-remove-cacheline.patch text/x-patch 1.8 KB
v20250324-0006-add-cacheline-padding-back.patch text/x-patch 2.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2025-03-24 13:43:36 Re: Adding extension default version to \dx
Previous Message Andres Freund 2025-03-24 13:08:05 Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum