Re: pg_logical_slot_peek_changes crashes postgres when called from inside pl/pgsql

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Ben Chobot <bench(at)silentmedia(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: pg_logical_slot_peek_changes crashes postgres when called from inside pl/pgsql
Date: 2017-10-05 21:10:58
Message-ID: 20171005211058.z5obk2c2e7zvauk6@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2017-10-05 11:38:40 -0400, Tom Lane wrote:
> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> > This looks like a legit bug to me. Andres, any opinions?
>
> I wonder why ReorderBufferCommit does this:
>
> if (using_subtxn)
> BeginInternalSubTransaction("replay");
> else
> StartTransactionCommand();
>
> and then tries to clean that up with this brain-dead-looking sequence:
>
> AbortCurrentTransaction();
>
> /* make sure there's no cache pollution */
> ReorderBufferExecuteInvalidations(rb, txn);
>
> if (using_subtxn)
> RollbackAndReleaseCurrentSubTransaction();
>
> Shouldn't that be something like
>
> if (using_subtxn)
> RollbackAndReleaseCurrentSubTransaction();
> else
> AbortCurrentTransaction();
>
> ? Although by this theory, the using_subtxn path has never worked,
> not even a little bit, which seems somewhat unlikely.

I'm not sure what the problem is that you're seeing? The separation of
AbortCurrentTransaction() and RollbackAndReleaseCurrentSubTransaction()
is so that invalidations get executed outside an xact. The AbortCurrent
will move from TBLOCK_SUBINPROGRESS to TBLOCK_SUBABORT, the release then
from TBLOCK_SUBABORT to the containing transaction's state?

Afaict the exactly same crashing codepath would be reached if
RollbackAndReleaseCurrentSubTransaction() were immediately called,
without a preceding AbortCurrentTransaction().

I don't think that's really the problem. I don't know SPI very well, but
from a quick look it looks to me that the problem is that
AtEOSubXact_SPI() enters the
/*
* If we are aborting a subtransaction and there is an open SPI context
* surrounding the subxact, clean up to prevent memory leakage.
*/
block even if 'found' is false. I need to look more at this, but just
adding a 'found &&' to that if makes things pass.

What we have here is that plpgsql uses SPI to execute
pg_logical_slot_peek_changes(), which internally does *not* use SPI. For
every replayed transaction it starts / finishes an internal
subtransaction, and when called from SPI the SPI subxabort handler acts
on the surrounding spi context.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Sean Chittenden 2017-10-05 21:34:37 Re: Re: [PATCH] BUG #13416: Postgres =?utf-8?Q?>=3D_?=9.3 doesn't use optimized shared memory on Solaris anymore
Previous Message Tom Lane 2017-10-05 15:38:40 Re: pg_logical_slot_peek_changes crashes postgres when called from inside pl/pgsql