Quick Links

Re: Reducing the chunk header sizes on all memory context types

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	David Rowley <dgrowleyml(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Reducing the chunk header sizes on all memory context types
Date:	2022-08-09 18:44:10
Message-ID:	20220809184410.g6yzu6pvttuphag5@awork3.anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2022-08-09 10:36:57 -0400, Robert Haas wrote:
> On Tue, Aug 9, 2022 at 8:53 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> > I think the patch is now starting to take shape. I've added it to the
> > September commitfest [1].
>
> This is extremely cool. The memory savings are really nice.

> And I also like this:
>
> # Additionally, this commit completely changes the rule that pointers must
> # be directly prefixed by the owning memory context and instead, we now
> # insist that they're directly prefixed by an 8-byte value where the least
> # significant 3-bits are set to a value to indicate which type of memory
> # context the pointer belongs to. Using those 3 bits as an index to a new
> # array which stores the methods for each memory context type, we're now
> # able to pass the pointer given to functions such as pfree and repalloc to
> # the function specific to that context implementation to allow them to
> # devise their own methods of finding the memory context which owns the
> # given allocated chunk of memory.
>
> That seems like a good system.

I'm obviously biased, but I agree.

I think it's fine, given that we can change this at any time, but it's
probably worth to explicitly agree that this will for now restrict us to 8
context methods?

> This part of the commit message might need to be clarified:
>
> # We also add a restriction that block sizes for all 3 of the memory
> # allocators cannot be 1GB or larger. We would be unable to store the
> # number of bytes that the block is offset from the chunk stored beyond this
> #1GB boundary on any block that was larger than 1GB.
>
> Earlier in the commit message, you say that allocations of 1GB or more
> are stored in dedicated blocks. But here you say that blocks can't be
> more than 1GB. Those statements seem to contradict each other. I guess
> you mean block sizes for blocks that contain chunks, or something like
> that?

I would guess so as well.

> diff --git a/src/include/utils/memutils_internal.h b/src/include/utils/memutils_internal.h
> new file mode 100644
> index 0000000000..2dcfdd7ec3
> --- /dev/null
> +++ b/src/include/utils/memutils_internal.h
> @@ -0,0 +1,117 @@
> +/*-------------------------------------------------------------------------
> + *
> + * memutils_internal.h
> + * This file contains declarations for memory allocation utility
> + * functions for internal use.
> + *
> + *
> + * Portions Copyright (c) 2022, PostgreSQL Global Development Group
> + * Portions Copyright (c) 1994, Regents of the University of California
> + *
> + * src/include/utils/memutils_internal.h
> + *
> + *-------------------------------------------------------------------------
> + */
> +
> +#ifndef MEMUTILS_INTERNAL_H
> +#define MEMUTILS_INTERNAL_H
> +
> +#include "utils/memutils.h"
> +
> +extern void *AllocSetAlloc(MemoryContext context, Size size);
> +extern void AllocSetFree(void *pointer);
> [much more]

I really wish I knew of a technique to avoid this kind of thing, allowing to
fill a constant array from different translation units... On the linker level
that should be trivial, but I don't think there's a C way to reach that.

> +/*
> + * MemoryContextMethodID
> + * A unique identifier for each MemoryContext implementation which
> + * indicates the index into the mcxt_methods[] array. See mcxt.c.
> + */
> +typedef enum MemoryContextMethodID
> +{
> + MCTX_ASET_ID = 0,

Is there a reason to reserve 0 here? Practically speaking the 8-byte header
will always contain not just zeroes, but I don't think the patch currently
enforces that. It's probably not worth wasting a "valuable" entry here...

> diff --git a/src/include/utils/memutils_memorychunk.h b/src/include/utils/memutils_memorychunk.h
> new file mode 100644
> index 0000000000..6239cf9008
> --- /dev/null
> +++ b/src/include/utils/memutils_memorychunk.h
> @@ -0,0 +1,185 @@
> +/*-------------------------------------------------------------------------
> + *
> + * memutils_memorychunk.h
> + * Here we define a struct named MemoryChunk which implementations of
> + * MemoryContexts may use as a header for chunks of memory they allocate.
> + *
> + * A MemoryChunk provides a lightweight header which a MemoryContext can use
> + * to store the size of an allocation and a reference back to the block which
> + * the given chunk is allocated on.
> + *
> + * Although MemoryChunks are used by each of our MemoryContexts, other
> + * implementations may choose to implement their own method for storing chunk
> + * headers. The only requirement is that the header end with an 8-byte value
> + * which the least significant 3-bits of are set to the MemoryContextMethodID
> + * of the given context.

Well, there can't be other implementations other than ours. So maybe phrase it
as "future implementations"?

> + * By default, a MemoryChunk is 8 bytes in size, however when
> + * MEMORY_CONTEXT_CHECKING is defined the header becomes 16 bytes in size due
> + * to the additional requested_size field. The MemoryContext may use this
> + * field for whatever they wish, but it is intended to be used for additional
> + * checks which are only done in MEMORY_CONTEXT_CHECKING builds.
> + *
> + * The MemoryChunk contains a uint64 field named 'hdrmask'. This field is
> + * used to encode 4 separate pieces of information. Starting with the least
> + * significant bits of 'hdrmask', the bits of this field as used as follows:
> + *
> + * 1. 3-bits to indicate the MemoryContextMethodID
> + * 2. 1-bit to indicate if the chunk is externally managed (see below)
> + * 3. 30-bits for the amount of memory which was reserved for the chunk
> + * 4. 30-bits for the number of bytes that must be subtracted from the chunk
> + * to obtain the address of the block that the chunk is stored on.
> + *
> + * Because we're limited to a block offset and chunk size of 1GB (30-bits),
> + * any allocation which exceeds this amount must call MemoryChunkSetExternal()
> + * and the MemoryContext must devise its own method for storing the offset for
> + * the block and size of the chunk.

Hm. So really only the first four bits have eactly that layout, correct?
Perhaps that could be clarified somehow?

> + /*
> + * The maximum size for a memory chunk before it must be externally managed.
> + */
> +#define MEMORYCHUNK_MAX_SIZE 0x3FFFFFFF
> +
> + /*
> + * The value to AND onto the hdrmask to determine if it's an externally
> + * managed memory chunk.
> + */
> +#define MEMORYCHUNK_EXTERNAL_BIT (1 << 3)

We should probably have a define for the three bits reserved for the context
id, likely in _internal.h

> @@ -109,6 +112,25 @@ typedef struct AllocChunkData *AllocChunk;
> */
> typedef void *AllocPointer;
>
> +/*
> + * AllocFreelistLink
> + * When pfreeing memory, if we maintain a freelist for the given chunk's
> + * size then we use a AllocFreelistLink to point to the current item in
> + * the AllocSetContext's freelist and then set the given freelist element
> + * to point to the chunk being freed.
> + */
> +typedef struct AllocFreelistLink
> +{
> + MemoryChunk *next;
> +} AllocFreelistLink;

I know we have no agreement on that, but I personally would add
AllocFreelistLink to typedefs.list and then re-pgindent ;)

> /*
> * AllocSetGetChunkSpace
> * Given a currently-allocated chunk, determine the total space
> * it occupies (including all memory-allocation overhead).
> */
> -static Size
> -AllocSetGetChunkSpace(MemoryContext context, void *pointer)
> +Size
> +AllocSetGetChunkSpace(void *pointer)
> {
> - AllocChunk chunk = AllocPointerGetChunk(pointer);
> - Size result;
> + MemoryChunk *chunk = PointerGetMemoryChunk(pointer);
>
> - VALGRIND_MAKE_MEM_DEFINED(chunk, ALLOCCHUNK_PRIVATE_LEN);
> - result = chunk->size + ALLOC_CHUNKHDRSZ;
> - VALGRIND_MAKE_MEM_NOACCESS(chunk, ALLOCCHUNK_PRIVATE_LEN);
> - return result;
> + if (MemoryChunkIsExternal(chunk))
> + {

Hm. We don't mark the chunk header as noaccess anymore? If so, why? I realize
it'd be a bit annoying because there's plenty places that look at it, but I
think it's also a good way to catch errors.

> +static const MemoryContextMethods mcxt_methods[] = {
> + [MCTX_ASET_ID] = {
> + AllocSetAlloc,
> + AllocSetFree,
> + AllocSetRealloc,
> + AllocSetReset,
> + AllocSetDelete,
> + AllocSetGetChunkContext,
> + AllocSetGetChunkSpace,
> + AllocSetIsEmpty,
> + AllocSetStats
> +#ifdef MEMORY_CONTEXT_CHECKING
> + ,AllocSetCheck
> +#endif
> + },
> +
> + [MCTX_GENERATION_ID] = {
> + GenerationAlloc,
> + GenerationFree,
> + GenerationRealloc,
> + GenerationReset,
> + GenerationDelete,
> + GenerationGetChunkContext,
> + GenerationGetChunkSpace,
> + GenerationIsEmpty,
> + GenerationStats
> +#ifdef MEMORY_CONTEXT_CHECKING
> + ,GenerationCheck
> +#endif
> + },
> +
> + [MCTX_SLAB_ID] = {
> + SlabAlloc,
> + SlabFree,
> + SlabRealloc,
> + SlabReset,
> + SlabDelete,
> + SlabGetChunkContext,
> + SlabGetChunkSpace,
> + SlabIsEmpty,
> + SlabStats
> +#ifdef MEMORY_CONTEXT_CHECKING
> + ,SlabCheck
> +#endif
> + },
> +};

Mildly wondering whether we ought to use designated initializers instead,
given we're whacking it around already. Too easy to get the order wrong when
adding new members, and we might want to have optional callbacks too.

Greetings,

Andres Freund

In response to

Re: Reducing the chunk header sizes on all memory context types at 2022-08-09 14:36:57 from Robert Haas

Responses

Re: Reducing the chunk header sizes on all memory context types at 2022-08-09 19:21:57 from Tom Lane
Re: Reducing the chunk header sizes on all memory context types at 2022-08-09 19:36:16 from Alvaro Herrera
Re: Reducing the chunk header sizes on all memory context types at 2022-08-10 06:33:49 from David Rowley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David Steele	2022-08-09 18:49:39	Re: moving basebackup code to its own directory
Previous Message	Justin Pryzby	2022-08-09 18:40:16	Re: moving basebackup code to its own directory