Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder()

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "ocean_li_996(at)163(dot)com" <ocean_li_996(at)163(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder()
Date: 2023-08-23 08:46:32
Message-ID: CAA4eK1LJ=CSsxETs5ydqP58OiWPiwodx=Jqw89LQ7fMrRWqK9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Mon, Aug 21, 2023 at 11:57 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> On 2023-08-21 19:07:16 +0530, Amit Kapila wrote:
> > On Mon, Aug 21, 2023 at 2:35 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > >
> > > On 2023-08-18 04:21:53 +0000, Zhijie Hou (Fujitsu) wrote:
> > > > From ee1dfccc0306812c356c84bbd7e2558f27d7d348 Mon Sep 17 00:00:00 2001
> > > > From: Hou Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
> > > > Date: Thu, 17 Aug 2023 19:29:34 +0800
> > > > Subject: [PATCH v4] cleanup decoding context in error cases
> > > >
> > > > Some of the management functions for logical decoding didn't clean up the
> > > > decoding context when an error occurs during decoding. This can
> > > > result in accessing unfreed resources and assert failure the next time we call
> > > > these functions. Fix it by cleaning up the decoding context and slots in error
> > > > cases.
> > >
> > > I don't think this is the right fix - at all. We shouldn't run arbitrary code
> > > after a failure, which we do by calling the shutdown callback.
> > >
> >
> > But OTOH, it can prevent freeing some global memory like in the case
> > of pgoutput_shutdown which frees some memory allocated in
> > CacheMemoryContext. Also, as pointed out by Hou-San[1], it can lead to
> > unwanted behavior as next time we can access some invalid entries.
>
> To me the code in pgoutput is just bad. For one, it uses a global variable
> for non-global state - instead it should store the data in
> LogicalDecodingContext->output_plugin_private. Secondly, it should not use
> CacheMemoryContext, given that the cache apparently is local to a individual
> pgoutput instance.
>

I agree that this is a better way and we should change the code. I
think we can probably allocate this in RelationSyncCache.

>
> But: Is there actually a danger of invalid entries being accessed? It looks to
> me like pgoutput uses invalidation callbacks etc, which would prevent the
> cache from being stale?
>

Actually, during invalidation, we simply mark entries as invalid, and
during the next access, we validate those by freeing the old stuff and
then build it from scratch. I feel if we do what you said in the
previous point this shouldn't be a problem either.

>
> > The other alternatives to fix are (a) Have some Reset* kind of
> > function (similar to ResetReindexState) to reset the required
> > variables and call at AbortTransaction time.
>
> -1. This makes things global concerns that shouldn't be.
>
> If we really need something to clean this up, I'd look at
> MemoryContextRegisterResetCallback().
>

+1. This sounds like a better idea.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message torikoshia 2023-08-23 09:04:18 Re: pg_rewind WAL segments deletion pitfall
Previous Message Tom Lane 2023-08-22 21:29:25 Re: BUG #18059: Unexpected error 25001 in stored procedure