Quick Links

Re: Improve monitoring of shared memory allocations

From:	Rahila Syed <rahilasyed90(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improve monitoring of shared memory allocations
Date:	2025-03-06 15:58:17
Message-ID:	CAH2L28uL-8EQTRSeyTpW1DqAXsDXRkXCkT1dus2u6p4HYDrxAg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Thank you for the review.

cfbot found a few compiler warnings:
>
> https://cirrus-ci.com/task/6526903542087680
> [16:47:46.964] make -s -j${BUILD_JOBS} clean
> [16:47:47.452] time make -s -j${BUILD_JOBS} world-bin
> [16:49:10.496] lwlock.c: In function ‘CreateLWLocks’:
> [16:49:10.496] lwlock.c:467:22: error: unused variable ‘found’
> [-Werror=unused-variable]
> [16:49:10.496] 467 | bool found;
> [16:49:10.496] | ^~~~~
> [16:49:10.496] cc1: all warnings being treated as errors
> [16:49:10.496] make[4]: *** [<builtin>: lwlock.o] Error 1
> [16:49:10.496] make[3]: *** [../../../src/backend/common.mk:37:
> lmgr-recursive] Error 2
> [16:49:10.496] make[3]: *** Waiting for unfinished jobs....
> [16:49:11.881] make[2]: *** [common.mk:37: storage-recursive] Error 2
> [16:49:11.881] make[2]: *** Waiting for unfinished jobs....
> [16:49:20.195] dynahash.c: In function ‘hash_create’:
> [16:49:20.195] dynahash.c:643:37: error: ‘curr_offset’ may be used
> uninitialized [-Werror=maybe-uninitialized]
> [16:49:20.195] 643 | curr_offset = (((char
> *)curr_offset) + (temp * elementSize));
> [16:49:20.195] |
> ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> [16:49:20.195] dynahash.c:588:23: note: ‘curr_offset’ was declared here
> [16:49:20.195] 588 | void *curr_offset;
> [16:49:20.195] | ^~~~~~~~~~~
> [16:49:20.195] cc1: all warnings being treated as errors
> [16:49:20.196] make[4]: *** [<builtin>: dynahash.o] Error 1
>
>
Fixed these.

>
> > diff --git a/src/backend/utils/hash/dynahash.c
> b/src/backend/utils/hash/dynahash.c
> > index cd5a00132f..5203f5b30b 100644
> > --- a/src/backend/utils/hash/dynahash.c
> > +++ b/src/backend/utils/hash/dynahash.c
> > @@ -120,7 +120,6 @@
> > * a good idea of the maximum number of entries!). For non-shared hash
> > * tables, the initial directory size can be left at the default.
> > */
> > -#define DEF_SEGSIZE 256
> > #define DEF_SEGSIZE_SHIFT 8 /* must be log2(DEF_SEGSIZE) */
> > #define DEF_DIRSIZE 256
>
> Why did you move this to the header? Afaict it's just used in
> hash_get_shared_size(), which is also in dynahash.c?
>
>
Yes. This was accidentally left behind by the previous version of the
patch, so I undid the change.

> > static void register_seq_scan(HTAB *hashp);
> > static void deregister_seq_scan(HTAB *hashp);
> > static bool has_seq_scans(HTAB *hashp);
> > -
> > +static int find_num_of_segs(long nelem, int *nbuckets, long
> num_partitions, long ssize);
> >
> > /*
> > * memory allocation support
>
> You removed a newline here that probably shouldn't be removed.

Fixed this.

>
> > @@ -468,7 +466,11 @@ hash_create(const char *tabname, long nelem, const
> HASHCTL *info, int flags)
> > else
> > hashp->keycopy = memcpy;
> >
> > - /* And select the entry allocation function, too. */
> > + /*
> > + * And select the entry allocation function, too. XXX should this
> also
> > + * Assert that flags & HASH_SHARED_MEM is true, since HASH_ALLOC is
> > + * currently only set with HASH_SHARED_MEM *
> > + */
> > if (flags & HASH_ALLOC)
> > hashp->alloc = info->alloc;
> > else
> > @@ -518,6 +520,7 @@ hash_create(const char *tabname, long nelem, const
> HASHCTL *info, int flags)
> >
> > hashp->frozen = false;
> >
> > + /* Initializing the HASHHDR variables with default values */
> > hdefault(hashp);
> >
> > hctl = hashp->hctl;
>
> I assume these were just observations you made while looking into this?
> They
> seem unrelated to the change itself?

Yes. I removed the first one and left the second one as a code comment.

>
> > @@ -582,7 +585,8 @@ hash_create(const char *tabname, long nelem, const
> HASHCTL *info, int flags)
> > freelist_partitions,
> > nelem_alloc,
> > nelem_alloc_first;
> > -
> > + void *curr_offset;
> > +
> > /*
> > * If hash table is partitioned, give each freelist an
> equal share of
> > * the initial allocation. Otherwise only freeList[0] is
> used.
> > @@ -592,6 +596,20 @@ hash_create(const char *tabname, long nelem, const
> HASHCTL *info, int flags)
> > else
> > freelist_partitions = 1;
> >
> > + /*
> > + * If table is shared, calculate the offset at which to
> find the
> > + * the first partition of elements
> > + */
> > + if (hashp->isshared)
> > + {
> > + int nsegs;
> > + int nbuckets;
> > + nsegs = find_num_of_segs(nelem, &nbuckets,
> hctl->num_partitions, hctl->ssize);
> > +
> > + curr_offset = (((char *) hashp->hctl) +
> sizeof(HASHHDR) + (info->dsize * sizeof(HASHSEGMENT)) +
> > + + (sizeof(HASHBUCKET) * hctl->ssize * nsegs));
> > + }
> > +
>
> Why only do this for shared hashtables? Couldn't we allocate the elments
> together with the rest for non-share hashtables too?
>

I think it is possible to consolidate the allocations for non-shared hash
tables
too. However, initial elements are much smaller in non-shared hash tables
due to
their ease of expansion. Therefore, there is probably less benefit in
trying to do
that for non-shared tables.
In addition, the proposed changes are targeted to improve the monitoring in
pg_shmem_allocations which won't be applicable to non-shared hashtables.
While I believe it is feasible, I am uncertain about the utility of such a
change.

> Seems a bit ugly to go through element_alloc() when pre-allocating.
> Perhaps
> it's the best thing we can do to avoid duplicating code, but it seems worth
> checking if we can do better. Perhaps we could split element_alloc() into
> element_alloc() and element_add() or such? With the latter doing
> everything
> after hashp->alloc().
>
>
Makes sense. I split the element_alloc() into element_alloc() and
element_add().

>
> > -
> > +
> > /*
> > * initialize mutexes if it's a partitioned table
> > */
>
> Spurious change.
>
>
Fixed.

A function called find_num_of_segs() that also sets nbuckets seems a bit
> confusing. I also don't like "find_*", as that sounds like it's searching
> some datastructure, rather than just doing a bit of math.
>

I renamed it to compute_buckets_and_segs(). I am open to better
suggestions.

> segp = (HASHSEGMENT) hashp->alloc(sizeof(HASHBUCKET) *
> hashp->ssize);
> >
> > if (!segp)
>
> Spurious change.
>

Fixed.

> -static int
> > +int
> > next_pow2_int(long num)
> > {
> > if (num > INT_MAX / 2)
> > @@ -1957,3 +1995,31 @@ AtEOSubXact_HashTables(bool isCommit, int
> nestDepth)
> > }
> > }
> > }
>
> Why export this?
>
> It was a stale change, I removed it now

> > diff --git a/src/include/utils/hsearch.h b/src/include/utils/hsearch.h
> > index 932cc4f34d..5e16bd4183 100644
> > --- a/src/include/utils/hsearch.h
> > +++ b/src/include/utils/hsearch.h
> > @@ -151,7 +151,7 @@ extern void hash_seq_term(HASH_SEQ_STATUS *status);
> > extern void hash_freeze(HTAB *hashp);
> > extern Size hash_estimate_size(long num_entries, Size entrysize);
> > extern long hash_select_dirsize(long num_entries);
> > -extern Size hash_get_shared_size(HASHCTL *info, int flags);
> > +extern Size hash_get_shared_size(HASHCTL *info, int flags, long
> init_size);
> > extern void AtEOXact_HashTables(bool isCommit);
> > extern void AtEOSubXact_HashTables(bool isCommit, int nestDepth);
>
> It's imo a bit weird that we have very related logic in
> hash_estimate_size()
> and hash_get_shared_size(). Why do we need to duplicate it?
>
>
hash_estimate_size() estimates using default values and
hash_get_shared_size()
calculates using specific values depending on the flags associated with the
hash
table. For instance, segment_size used by the former is DEF_SEGSIZE and
the latter uses info->ssize which is set when the HASH_SEGMENT flag is true.
Hence, they might return different values for shared memory sizes.

>
> These I'd just combine with the ShmemInitStruct("PredXactList"), by
> allocating
> the additional space. The pointer math is a bit annoying, but it makes much
> more sense to have one entry in pg_shmem_allocations.
>
> Fixed accordingly.

>
> > - (TransactionId *) ShmemAlloc(TotalProcs *
> sizeof(*ProcGlobal->xids));
> > + (TransactionId *) ShmemInitStruct("Proc Transaction Ids",
> TotalProcs * sizeof(*ProcGlobal->xids), &found);
> > MemSet(ProcGlobal->xids, 0, TotalProcs *
> sizeof(*ProcGlobal->xids));
> > - ProcGlobal->subxidStates = (XidCacheStatus *)
> ShmemAlloc(TotalProcs * sizeof(*ProcGlobal->subxidStates));
> > + ProcGlobal->subxidStates = (XidCacheStatus *)
> ShmemInitStruct("Proc Sub-transaction id states", TotalProcs *
> sizeof(*ProcGlobal->subxidStates), &found);
> > MemSet(ProcGlobal->subxidStates, 0, TotalProcs *
> sizeof(*ProcGlobal->subxidStates));
> > - ProcGlobal->statusFlags = (uint8 *) ShmemAlloc(TotalProcs *
> sizeof(*ProcGlobal->statusFlags));
> > + ProcGlobal->statusFlags = (uint8 *) ShmemInitStruct("Proc Status
> Flags", TotalProcs * sizeof(*ProcGlobal->statusFlags), &found);
> > MemSet(ProcGlobal->statusFlags, 0, TotalProcs *
> sizeof(*ProcGlobal->statusFlags));
> >
> > /*
>
> Same.
>
> Although here I'd say it's worth padding the size of each separate
> "allocation" by PG_CACHE_LINE_SIZE.
>

Made this change.

> > - fpPtr = ShmemAlloc(TotalProcs * (fpLockBitsSize + fpRelIdSize));
> > + fpPtr = ShmemInitStruct("Fast path lock arrays", TotalProcs *
> (fpLockBitsSize + fpRelIdSize), &found);
> > MemSet(fpPtr, 0, TotalProcs * (fpLockBitsSize + fpRelIdSize));
> >
> > /* For asserts checking we did not overflow. */
>
> This one might actually make sense to keep separate, depending on the
> configuration it can be reasonably big (max_connection = 1k,
> max_locks_per_transaction=1k results in ~5MB)..
>
> OK

PFA the rebased patches with the above changes.

Kindly let me know your views.

Thank you,
Rahila Syed

Attachment	Content-Type	Size
v2-0002-Replace-ShmemAlloc-calls-by-ShmemInitStruct.patch	application/octet-stream	6.1 KB
v2-0001-Account-for-initial-shared-memory-allocated-by-hash_.patch	application/octet-stream	11.4 KB

In response to

Re: Improve monitoring of shared memory allocations at 2025-03-03 18:07:47 from Andres Freund

Responses

Re: Improve monitoring of shared memory allocations at 2025-03-12 10:46:10 from Rahila Syed

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Jelte Fennema-Nio	2025-03-06 16:00:52	Re: Commitfest app release on Feb 17 with many improvements
Previous Message	Andrew Dunstan	2025-03-06 15:57:32	Re: jsonb_strip_nulls with arrays?