Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder()

From: Andres Freund <andres(at)anarazel(dot)de>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "ocean_li_996(at)163(dot)com" <ocean_li_996(at)163(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: BUG #18055: logical decoding core on AllocateSnapshotBuilder()
Date: 2023-08-20 21:05:45
Message-ID: 20230820210545.plmlk3rnxxgkokkj@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2023-08-18 04:21:53 +0000, Zhijie Hou (Fujitsu) wrote:
> From ee1dfccc0306812c356c84bbd7e2558f27d7d348 Mon Sep 17 00:00:00 2001
> From: Hou Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
> Date: Thu, 17 Aug 2023 19:29:34 +0800
> Subject: [PATCH v4] cleanup decoding context in error cases
>
> Some of the management functions for logical decoding didn't clean up the
> decoding context when an error occurs during decoding. This can
> result in accessing unfreed resources and assert failure the next time we call
> these functions. Fix it by cleaning up the decoding context and slots in error
> cases.

I don't think this is the right fix - at all. We shouldn't run arbitrary code
after a failure, which we do by calling the shutdown callback. It's not going
to expect to be called in a context where you're not allowed to do very much.

Consider what would happen if FreeDecodingContext() or such failed - we'd not
have executed InvalidateSystemCaches(), and therefore would continue with
corrupted system caches.

There's also no need to free memory here, that's already done via the memory
context cleanup of the "surrounding" memory context (i.e. CurrentMemoryContext
when StartupDecodingContext() is called)

I think the real problem is that 272248a0c added InitialRunningXacts a global
variable. If it just lived in in struct SnapBuild, this whole thing wouldn't
be an issue? The commit justified this choice with avoiding an ABI breakage -
but isn't that bogus? The struct is private to snapbuild.c. It doesn't live on
disk (that's SnapBuildOnDisk). There's no ABI to preserve?

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message James Inform 2023-08-20 21:12:35 Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected.
Previous Message David Rowley 2023-08-20 21:04:39 Re: BUG #18060: Left joining rows using random() function in join condition doesn't work as expected.