Re: Reducing System Allocator Thrashing of ExecutorState to Alleviate FDW-related Performance Degradations

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Reducing System Allocator Thrashing of ExecutorState to Alleviate FDW-related Performance Degradations
Date: 2023-02-25 06:26:58
Message-ID: CAFBsxsEeN2go4+ok00HV4Zx7Sr6OMpZ2-iQr+szFxprVfs7y0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 21, 2023 at 2:46 AM Andres Freund <andres(at)anarazel(dot)de> wrote:

> On 2023-02-21 08:33:22 +1300, David Rowley wrote:
> > I am interested in a bump allocator for tuplesort.c. There it would be
> > used in isolation and all the code which would touch pointers
> > allocated by the bump allocator would be self-contained to the
> > tuplesorting code.
> >
> > What use case do you have in mind?
>
> E.g. the whole executor state tree (and likely also the plan tree) should
be
> allocated that way. They're never individually freed. But we also allocate
> other things in the same context, and those do need to be individually
> freeable. We could use a separate memory context, but that'd increase
memory
> usage in many cases, because there'd be two different blocks being
allocated
> from at the same time.

That reminds me of this thread I recently stumbled across about memory
management of prepared statements:

https://www.postgresql.org/message-id/20190726004124.prcb55bp43537vyw%40alap3.anarazel.de

I recently heard of a technique for relative pointers that could enable
tree structures within a single allocation.

If "a" needs to store the location of "b" relative to "a", it would be
calculated like

a = (char *) &b - (char *) &a;

...then to find b again, do

typeof_b* b_ptr;
b_ptr = (typeof_b* ) ((char *) &a + a);

One issue with this naive sketch is that zero would point to one's self,
and it would be better if zero still meant "invalid pointer" so that
memset(0) does the right thing.

Using signed byte-sized offsets as an example, the range is -128 to 127, so
we can call -128 the invalid pointer, or in binary 0b1000_0000.

To interpret a raw zero as invalid, we need an encoding, and here we can
just XOR it:

#define Encode(a) a^0b1000_0000;
#define Decode(a) a^0b1000_0000;

Then, encode(-128) == 0 and decode(0) == -128, and memset(0) will do the
right thing and that value will be decoded as invalid.

Conversely, this preserves the ability to point to self, if needed:

encode(0) == -128 and decode(-128) == 0

...so we can store any relative offset in the range -127..127, as well as
"invalid offset". This extends to larger signed integer types in the
obvious way.

Putting the above two calculations together, the math ends up like this,
which can be put into macros:

absolute to relative:
a = Encode((int32) (char *) &b - (char *) &a);

relative to absolute:
typeof_b* b_ptr;
b_ptr = (typeof_b* ) ((char *) &a + Decode(a));

I'm not yet familiar enough with parse/plan/execute trees to know if this
would work or not, but that might be a good thing to look into next cycle.

--
John Naylor
EDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-02-25 06:33:43 Re: how does postgresql handle LOB/CLOB/BLOB column data that dies before the query ends
Previous Message Noel Grandin 2023-02-25 06:19:39 Re: how does postgresql handle LOB/CLOB/BLOB column data that dies before the query ends