Re: [HACKERS] Transaction control in procedures

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Transaction control in procedures
Date: 2017-11-14 18:09:26
Message-ID: ecd54a96-71a5-fb21-423f-5a6847363ba8@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/14/17 09:27, Merlin Moncure wrote:
> *) Will it be possible to do operations like this in pl/pgsql?
>
> BEGIN
> SELECT INTO r * FROM foo;
>
> START TRANSACTION; -- perhaps we ought to have a special function
> for this instead (BEGIN is reserved, etc).
> SET transaction_isololation TO serializable;
> ...

Eventually, I don't see why not. Currently, it's not complete.

One detail in your example is that when you enter the procedure, you are
already in a transaction, so you would have to run either COMMIT or
ROLLBACK before the START TRANSACTION.

Also, you can't run SET TRANSACTION ISOLATION through SPI, so one would
have to implement a separate code path for that, but that would just be
a bit of leg work.

> *) Will there be any negative consequences to a procedure running
> with an unbounded run time? For example, something like:
>
> LOOP
> SELECT check_for_stuff_to_do();
>
> IF stuff_to_do
> THEN
> do_stuff();
> ELSE
> PERFORM pg_sleep(1);
> END IF;
> END LOOP;

That procedure doesn't do anything with transactions, so it's just like
a long-running function. Otherwise, it'd just be like long-running
client code managing transactions.

> *) Will pg_cancel_backend() cancel the currently executing statement
> or the procedure? (I guess probably the procedure but I'm curious)

Same as the way it currently works. It will raise an exception, which
will travel up the stack and eventually issue an error or be caught. If
someone knows more specific concerns here I could look into it, but I
don't see any problem.

> *) Will long running procedures be subject to statement timeout (and
> does it apply to the entire procedure)?

See previous item.

> Will they be able to control
> statement_timeout from within the procedure itself?

The statement timeout alarm is set by the top-level execution loop, so
you can't change a statement timeout that is already in progress. But
you could change the GUC and commit it for the next top-level statement.

> *) Will pg_stat_activity show the invoking CALL or the currently
> executing statement? I see a strong argument for showing both of
> these things. although I understand that's out of scope here.

Not different from a function execution, i.e., top-level statement.

> If these questions (especially the first two) come down the correct
> way, then it will mean that I can stop coding in other languages
> (primarily bash) for a fairly large number of cases that I really
> think belong in the database itself. This would really simplify
> coding, some things in bash are really awkward to get right such as a
> mutex to guarantee single script invocation. My only real dependency
> on the operation system environment at that point would be cron to
> step in to the backround daemon process (which would immediately set
> an advisory lock).

Well, some kind of scheduler around procedures would be pretty cool at
some point.

> I'm somewhat surprised that SPI is the point of attack for this
> functionality, but if it works that's really the best case scenario
> (the only downside I can see is that the various out of core pl/s have
> to implement the interface individually).

So I tried different things here, and I'll list them here to explain how
I got there.

Option zero is to not use SPI at all and implement a whole new internal
command execution system. But that would obviously be a large amount of
work, and it would look 85% like SPI, and as it turns out it's not
necessary.

The first thing I tried out what to run transaction control statements
through SPI. That turned out to be very complicated and confusing and
fragile, mainly because of the dichotomy between the internal
subtransaction management that the PLs do and the explicit transaction
control statements on the other hand. It was just a giant unworkable mess.

The next thing I tried was to shut down (SPI_finish) SPI before a
transaction boundary command and restart it (SPI_connect) it afterwards.
That would work in principle, but it would require a fair amount of
work in each PL, because they implicitly rely on SPI (or perhaps are
tangled up with SPI) for memory management.

The setup I finally arrived at was to implement the transaction boundary
commands as SPI API calls and let them internally make sure that only
the appropriate stuff is cleared away at transaction boundaries. This
turned out to be the easiest and cleanest. I have since the last patch
implemented the transaction control capabilities in PL/pgSQL, PL/Perl,
and PL/Tcl, and it was entirely trivial once the details were worked out
as I had shown in PL/Python. I will post an updated patch with this soon.

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-11-14 18:12:07 Re: [HACKERS] Proposal: Local indexes for partitioned table
Previous Message Peter Geoghegan 2017-11-14 18:01:05 Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)