Re: Memory context can be its own parent and child in replication command

From: Anthonin Bonnefoy <anthonin(dot)bonnefoy(at)datadoghq(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Memory context can be its own parent and child in replication command
Date: 2025-03-11 09:26:32
Message-ID: CAO6_XqqQxCW7xJvBc=obbrWVVG3DWYJrUg79=YaOmX34XsEmVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 11, 2025 at 1:09 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> It seems to me that you mean 1afe31f03cd2, no?

Yes, that was a bad copy/paste from me.

On Tue, Mar 11, 2025 at 2:13 AM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I dunno about this patch: it seems to me it's doing things exactly
> backwards, namely trying to restore the CurrentMemoryContext after
> a transaction abort to what it was just beforehand. With only
> a slightly different set of suppositions, this is introducing
> use of a deleted context rather than removing it.

Yeah, I'm not super happy about the fix either. This leaves the issue
that CurrentMemoryContext is set to a deleted context, even if
temporarily.

> I'm inclined to suspect that the real problem is somebody installed
> the wrong context as current just before the transaction was started.
> I don't have the energy to look closer right now, though.

The context installed is the replication command context created at
the beginning of exec_replication_command. This context is deleted at
the end of the function while the transaction is still running. Maybe
a better fix would be to not delete the Replication command context
when a transaction was started to handle the export, and switch back
to this context at the start of the next exec_replication_command?
This way, the memory context referenced by priorContext would still be
valid when the transaction is aborted.

This would also require the Replication command context to not be
attached under the MessageContext to avoid being deleted at the end of
the query cycle.

> Independently of where the actual bug is there, it seems not
> nice that it's so easy to bollix the memory context data
> structures. I wonder if we can make that more detectable
> at reasonable cost.

One thing that made it hard to trace the issue was that there's no way
to know if a MemoryContext was deleted or not. Setting the
MemoryContext's node type to T_Invalid during reset could be a way to
tag the context as deleted. And this will be able to leverage the
existing MemoryContextIsValid check and additional
MemoryContextIsValid checks could be added before restoring the
priorContext.

--- a/src/backend/utils/mmgr/mcxt.c
+++ b/src/backend/utils/mmgr/mcxt.c
@@ -527,6 +527,7 @@ MemoryContextDeleteOnly(MemoryContext context)

context->methods->delete_context(context);

+ context->type = T_Invalid;
VALGRIND_DESTROY_MEMPOOL(context);
}

However, when testing this on my mac, it seems to trigger a heap
corruption during initdb.

postgres(9057,0x200690840) malloc: Heap corruption detected, free list
is damaged at 0x60000322bb10
*** Incorrect guard value: 276707563012096
postgres(9057,0x200690840) malloc: *** set a breakpoint in
malloc_error_break to debug

While it ran fine on linux. I didn't have time to check why yet.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2025-03-11 09:46:26 Re: Reducing the log spam
Previous Message Evgeny 2025-03-11 09:12:10 Re: Elimination of the repetitive code at the SLRU bootstrap functions.