| From: | Andres Freund <andres(at)anarazel(dot)de> | 
|---|---|
| To: | Andrey Borodin <amborodin86(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com> | 
| Cc: | Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Transaction timeout | 
| Date: | 2022-12-06 00:22:47 | 
| Message-ID: | 20221206002247.w3qfxdg64f7arl2z@alap3.anarazel.de | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
Hi,
On 2022-12-05 15:41:29 -0800, Andrey Borodin wrote:
> Thanks for looking into this Andres!
>
> On Mon, Dec 5, 2022 at 3:07 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > I'm a bit worried about adding evermore branches and function calls for
> > the processing of single statements. We already spend a noticable
> > percentage of the cycles for a single statement in PostgresMain(), this
> > adds additional overhead.
> >
> > I'm somewhat inclined to think that we need some redesign here before we
> > add more overhead.
> >
> We can cap statement_timeout\idle_session_timeout by the budget of
> transaction_timeout left.
I don't know what you mean by that.
> @@ -3277,6 +3282,7 @@ ProcessInterrupts(void)
>  		 */
>  		lock_timeout_occurred = get_timeout_indicator(LOCK_TIMEOUT, true);
>  		stmt_timeout_occurred = get_timeout_indicator(STATEMENT_TIMEOUT, true);
> +		tx_timeout_occurred = get_timeout_indicator(TRANSACTION_TIMEOUT, true);
>  
>  		/*
>  		 * If both were set, we want to report whichever timeout completed
This doesn't update the preceding comment, btw, which now reads oddly:
		/*
		 * If LOCK_TIMEOUT and STATEMENT_TIMEOUT indicators are both set, we
		 * need to clear both, so always fetch both.
		 */
> > > @@ -1360,6 +1363,16 @@ IdleInTransactionSessionTimeoutHandler(void)
> > >       SetLatch(MyLatch);
> > >  }
> > >
> > > +static void
> > > +TransactionTimeoutHandler(void)
> > > +{
> > > +#ifdef HAVE_SETSID
> > > +     /* try to signal whole process group */
> > > +     kill(-MyProcPid, SIGINT);
> > > +#endif
> > > +     kill(MyProcPid, SIGINT);
> > > +}
> > > +
> >
> > Why does this use signals instead of just setting the latch like
> > IdleInTransactionSessionTimeoutHandler() etc?
>
> I just copied statement_timeout behaviour. As I understand this
> implementation is prefered if the timeout can catch the backend
> running at full steam.
Hm. I'm not particularly convinced by that code. Be that as it may, I
don't think it's a good idea to have one more copy of this code. At
least the patch should wrap the signalling code in a helper.
FWIW, the HAVE_SETSID code originates in:
commit 3ad0728c817bf8abd2c76bd11d856967509b307c
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date:   2006-11-21 20:59:53 +0000
    On systems that have setsid(2) (which should be just about everything except
    Windows), arrange for each postmaster child process to be its own process
    group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
    process group not only the direct child process.  This provides saner behavior
    for archive and recovery scripts; in particular, it's possible to shut down a
    warm-standby recovery server using "pg_ctl stop -m immediate", since delivery
    of SIGQUIT to the startup subprocess will result in killing the waiting
    recovery_command.  Also, this makes Query Cancel and statement_timeout apply
    to scripts being run from backends via system().  (There is no support in the
    core backend for that, but it's widely done using untrusted PLs.)  Per gripe
    from Stephen Harris and subsequent discussion.
> > > diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
> > > index 0081873a72..5229fe3555 100644
> > > --- a/src/bin/pg_dump/pg_backup_archiver.c
> > > +++ b/src/bin/pg_dump/pg_backup_archiver.c
> > > @@ -3089,6 +3089,7 @@ _doSetFixedOutputState(ArchiveHandle *AH)
> > >       ahprintf(AH, "SET statement_timeout = 0;\n");
> > >       ahprintf(AH, "SET lock_timeout = 0;\n");
> > >       ahprintf(AH, "SET idle_in_transaction_session_timeout = 0;\n");
> > > +     ahprintf(AH, "SET transaction_timeout = 0;\n");
> >
> > Hm - why is that the right thing to do?
> Because transaction_timeout has effects of statement_timeout.
I guess it's just following precedent - but it seems a bit presumptuous
to just disable safety settings a DBA might have set up. That makes some
sense for e.g. idle_in_transaction_session_timeout, because I think
e.g. parallel backup can lead to a connection being idle for a bit.
A few more review comments:
> Either way we can do batch function enable_timeouts() instead
> enable_timeout_after().
> Does anything of it make sense?
I'm at least as worried about the various calls *after* the execution of
a statement.
> +		if (tx_timeout_occurred)
> +		{
> +			LockErrorCleanup();
> +			ereport(ERROR,
> +					(errcode(ERRCODE_TRANSACTION_TIMEOUT),
> +					 errmsg("canceling transaction due to transaction timeout")));
> +		}
The number of calls to LockErrorCleanup() here feels wrong - there's
already 8 calls in ProcessInterrupts(). Besides the code duplication I
also think it's not a sane idea to rely on having LockErrorCleanup()
before all the relevant ereport(ERROR)s.
Greetings,
Andres Freund
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2022-12-06 00:22:57 | Re: Add LZ4 compression in pg_dump | 
| Previous Message | Tom Lane | 2022-12-06 00:18:11 | Re: Error-safe user functions |