Re: Fix crash when non-creator being an iteration on shared radix tree

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: John Naylor <johncnaylorls(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix crash when non-creator being an iteration on shared radix tree
Date: 2024-12-20 19:16:52
Message-ID: CAD21AoBFJReoXxEyhxKx0+XqwiJjLC98E_Btdn5HacEu1rcj-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 20, 2024 at 2:27 AM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> On Fri, Dec 20, 2024 at 4:12 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Dec 18, 2024 at 10:32 PM John Naylor <johncnaylorls(at)gmail(dot)com> wrote:
>
> > > 2. The iter_context is separate because the creator's new context
> > > could be a bump context which doesn't support pfree. But above we
> > > assume we can pfree in the caller's context. Also, IIUC we only
> > > allocate small iter objects, and it'd be unusual to need more than one
> > > at a time per backend, so it's a bit strange to have an entire context
> > > for that. Since we use a standard pattern of "begin; while(iter);
> > > end;", it seems unlikely that someone will cause a leak because of a
> > > coding mistake in iteration.
>
> v3-0001 allocates the iter data in the caller's context. It's a small
> patch, but still a core behavior change so proposed for master-only. I
> believe your v1 is still the least invasive fix for PG17.

I agree to use v1 for v17.

>
> > > If these tiny admin structs were always, not sometimes, in the callers
> > > current context, I think it would be easier to reason about because
> > > then the creator's passed context would be used only for local memory,
> > > specifically only for leaves and the inner node child contexts.
>
> 0002 does this.
>
> > Fair points. Given that we need only one iterator at a time per
> > backend, it would be simpler if the caller passes the pointer to an
> > iterator that is a stack variable to RT_BEGIN_ITEREATE(). For example,
> > TidStoreBeginIterate() would be like:
> >
> > if (TidStoreIsShared(ts))
> > shared_ts_begin_iterate(ts->tree.shared, &iter->tree_iter.shared);
> > else
> > local_ts_begin_iterate(ts->tree.local, &iter->tree_iter.shared);
>
> Hard for me to tell if it'd be simpler.
>
> > > 3. I was never a fan of trying to second-guess the creator's new
> > > context and instead use slab for fixed-sized leaf allocations. If the
> > > creator passes a bump context, we say "no, no, no, use slab -- it's
> > > good for ya". Let's assume the caller knows what they're doing.
> >
> > That's a valid argument but how can a user use the slab context for
> > leaf allocations?
>
> It's trivial after 0001-02: 0003 removes makes one test use slab as
> the passed context (only 32-bit systems would actually use it
> currently).

These changes make sense to me. Here are a few comments:

RT_RADIX_TREE has 'leaf_context' but it seems that we use it only for
local memory. Do we want to declare only in the !RT_SHMEM case?

---
/*
* Similar to TidStoreCreateLocal() but create a shared TidStore on a
* DSA area. The TID storage will live in the DSA area, and the memory
* context rt_context will have only meta data of the radix tree.
*
* The returned object is allocated in backend-local memory.
*/

We need to update the comment about rt_context too since we no longer
use rt_context for shared tidstore.

>
> Also, with a bit more work we could allow a NULL context for when the
> caller has purposely arranged to use pointer-sized values. Did you see
> any of Heikki's CSN patches? There is a radix tree used as a cache in
> a context where the tree could be created and destroyed frequently.
> Something about the memory blocks seems to have tickled some bad case
> in the glibc allocator, and one less context might be good insurance:
>
> https://www.postgresql.org/message-id/718d1788-b058-40e6-bc37-8f15612b5646%40iki.fi

Will check these patches. Thanks!

Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2024-12-20 19:23:30 Re: Memory leak in WAL sender with pgoutput (v10~)
Previous Message Trey Boudreau 2024-12-20 18:56:20 Discussion on a LISTEN-ALL syntax