Bringing non-atomic operation to a PL that hasn't had it

From: Chapman Flack <jcflack(at)acm(dot)org>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Bringing non-atomic operation to a PL that hasn't had it
Date: 2025-03-28 00:12:47
Message-ID: 67E5E97F.4070103@acm.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

If a maintainer of a 20-year-old PL that never supported non-atomic
operation wants to catch up, it is easy enough to find the documentation
on SPI_connect_ext and SPI_OPT_NONATOMIC, and learn how to determine
the atomic/non-atomic state on entry from the atomic field of
InlineCodeBlock for a DO, or of CallContext for a CALL. So for a start,
noting that flag and passing it on to SPI_connect_ext seems straightforward.

The questions seem to quickly outrun the documentation though. I think I can
safely say a few things:

- Even in non-atomic mode, SPI only allows transaction control via its
dedicated SPI_commit / SPI_rollback functions, and never by passing
a transaction utility command as query text.

That seems unobjectionable once understood - avoid having many ways to
do the same thing, and require use of the dedicated functions for the purpose.

- SPI_execute_extended and SPI_execute_plan_extended *also* have an
allow_nonatomic option. That option does not make them any more tolerant
of a transaction command in query text: they still reject that. But they
will pass the nonatomic state along to a nested PL invocation. So they
still won't accept "ROLLBACK", but by passing true for allow_nonatomic
you can get "DO LANGUAGE plpgsql 'BEGIN ROLLBACK; END;'" past them.

That naturally leads to the question of when I ought to pass allow_nonatomic
to those functions, which seems to be a thornier question than just "did
you pass SPI_OPT_NONATOMIC to SPI_connect_ext?". (Otherwise, why couldn't
SPI just do that bookkeeping itself?)

git blame on the allow_nonatomic option leads to a long discussion thread
on explicitly managing snapshots and such, which got me wondering how far
into those weeds a PL maintainer has to trek to do something that works.

Has anyone here blogged or written more extensively than what's in the docs
on just how to approach bringing old PL code up to date with this feature?

I can confirm one piece of unexpected thorny behavior. Where dosql is
a dead-simple toy language I cooked up[1]:

# do language dosql 'do language plpgsql ''begin rollback; end;''';
ERROR: plancache reference 0x36e1a88 is not owned by resource owner
PL/pgSQL simple expressions
CONTEXT: SQL statement "do language plpgsql 'begin rollback; end;'"

This error is reported after the rollback succeeded, PL/pgSQL is all done
and control has returned to _SPI_execute_plan. So why is anything referring
to a PL/pgSQL resource owner?

Turns out I had passed "do language plpgsql 'begin rollback; end;'" to SPI
as a saved plan. (Why? Because the 20-year-old code I started with saves
plans.) I didn't supply a ResourceOwner of my choice in the call, so
_SPI_execute_plan assigned plan_owner = CurrentResourceOwner before calling
GetCachedPlan.

The work then proceeds smoothly, rollback gets executed, lots of things
go away including the ResourceOwner that plan_owner points to, and in
a fabulous (but, for me, quite repeatable) coincidence, the next thing
to be allocated at that exact address is a new ResourceOwner for
PL/pgSQL simple expressions.

So when control gets back to _SPI_execute_plan and it uses the
technically-now-dangling plan_owner pointer to ask the owner to forget
the plan, instead of a crash it gets the PL/pgSQL simple expressions
owner complaining that it's never heard of the plan it's being asked
to forget.

So I suppose one moral I should take from that is "never pass
allow_nonatomic unless you're sure you're not passing a saved plan"?

Or is it "... you're sure you don't have any saved plans?"

And how many other morals like that lie ahead for me to discover?

Also, would it be worthwhile at all to make this behavior in
_SPI_execute_plan a little less perplexing? It does have plan->saved
and allow_nonatomic right there in scope, so it could easily ereport(
"you've passed allow_nonatomic and a saved plan, so you haven't thought
hard enough about this, come back when you've read document X"), and
that could simplify the learning process. Especially if there is
such a document X somewhere.

Regards,
-Chap

[1]
https://github.com/tada/pljava/blob/feature/REL1_7_STABLE/model/pljava-examples/src/main/java/org/postgresql/pljava/example/polyglot/DoSQL.java

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2025-03-28 00:14:16 Re: Remove restrictions in recursive query
Previous Message Michael Paquier 2025-03-27 23:57:34 Re: [PATCH] PGSERVICEFILE as part of a normal connection string