Quick Links

Re: [PATCH] lock_timeout and common SIGALRM framework

From:	Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Marc Cousin <cousinmarc(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>, Ants Aasma <ants(at)cybertec(dot)at>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [PATCH] lock_timeout and common SIGALRM framework
Date:	2012-07-13 20:32:26
Message-ID:	500085DA.5050908@cybertec.at
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

2012-07-12 19:05 keltezéssel, Tom Lane írta:
> Here is a revised version of the timeout-infrastructure patch.
> I whacked it around quite a bit, notably:
>
> * I decided that the most convenient way to handle the initialization
> issue was to combine establishment of the signal handler with resetting
> of the per-process variables. So handle_sig_alarm is no longer global,
> and InitializeTimeouts is called at the places where we used to do
> "pqsignal(SIGALRM, handle_sig_alarm);". I believe this is correct
> because any subprocess that was intending to use SIGALRM must have
> called that before establishing any timeouts.

OK.

> * BTW, doing that exposed the fact that walsender processes were failing
> to establish a SIGALRM signal handler, which is a pre-existing bug,
> because they run a normal authentication transaction during InitPostgres
> and hence need to be able to cope with deadlock and statement timeouts.
> I will do something about back-patching a fix for that.

Wow, my work uncovered a pre-existing bug in PostgreSQL. :-)

> * I ended up putting the RegisterTimeout calls for DEADLOCK_TIMEOUT
> and STATEMENT_TIMEOUT into InitPostgres, ensuring that they'd get
> done in walsender and autovacuum processes. I'm not totally satisfied
> with that, but for sure it didn't work to only establish them in
> regular backends.
>
> * I didn't like the "TimeoutName" nomenclature, because to me "name"
> suggests that the value is a string, not just an enum. So I renamed
> that to TimeoutId.

OK.

> * I whacked around the logic in timeout.c a fair amount, because it
> had race conditions if SIGALRM happened while enabling or disabling
> a timeout. I believe the attached coding is safe, but I'm not totally
> happy with it from a performance standpoint, because it will do two
> setitimer calls (a disable and then a re-enable) in many cases where
> the old code did only one.

Disabling deadlock timeout, a.k.a. disable_sig_alarm(false) in
the original code called setitimer() twice if statement timeout
was still in effect, it was done in CheckStatementTimeout().
Considering this, I think there is no performance problem with
the new code you came up with.

> I think what we ought to do is go ahead and apply this, so that we
> can have the API nailed down, and then we can revisit the internals
> of timeout.c to see if we can't get the performance back up.
> It's clearly a much cleaner design than the old spaghetti logic in
> proc.c, so I think we ought to go ahead with this independently of
> whether the second patch gets accepted.

There is one tiny bit you might have broken. You wrote previously:

> I am also underwhelmed by the "timeout_start" callback function concept.
> In the first place, that's broken enable_timeout, which incorrectly
> assumes that the value it gets must be "now" (see its schedule_alarm
> call). In the second place, it seems fairly likely that callers of
> get_timeout_start would likewise want the clock time at which the
> timeout was enabled, not the timeout_start reference time. (If they
> did want the latter, why couldn't they get it from wherever the callback
> function had gotten it?) I'm inclined to propose that we drop the
> timeout_start concept and instead provide two functions for scheduling
> interrupts:
>
> enable_timeout_after(TimeoutName tn, int delay_ms);
> enable_timeout_at(TimeoutName tn, TimestampTz fin_time);
>
> where you use the former if you want the standard GetCurrentTimestamp +
> n msec calculation, but if you want the stop time calculated in some
> other way, you calculate it yourself and use the second function.

It's okay, but you haven't really followed it with STATEMENT_TIMEOUT:

-----8<----------8<----------8<----------8<----------8<-----
*** 2396,2404 ****
/* Set statement timeout running, if any */
/* NB: this mustn't be enabled until we are within an xact */
if (StatementTimeout > 0)
! enable_sig_alarm(StatementTimeout, true);
else
! cancel_from_timeout = false;

xact_started = true;
}
--- 2397,2405 ----
/* Set statement timeout running, if any */
/* NB: this mustn't be enabled until we are within an xact */
if (StatementTimeout > 0)
! enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout);
else
! disable_timeout(STATEMENT_TIMEOUT, false);

xact_started = true;
}
-----8<----------8<----------8<----------8<----------8<-----

It means that StatementTimeout losts its precision. It would trigger
in the future counting from "now" instead of counting from
GetCurrentStatementStartTimestamp(). It should be:

enable_timeout_at(STATEMENT_TIMEOUT,
TimestampTzPlusMilliseconds(GetCurrentStatementStartTimestamp(), StatementTimeout));

> I haven't really looked at the second patch yet, but at minimum that
> will need some rebasing to match the API tweaks here.

Yes, I will do that.

Thanks for your review and work.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/

In response to

Re: [PATCH] lock_timeout and common SIGALRM framework at 2012-07-12 17:05:24 from Tom Lane

Responses

Re: [PATCH] lock_timeout and common SIGALRM framework at 2012-07-13 21:37:50 from Boszormenyi Zoltan
Re: [PATCH] lock_timeout and common SIGALRM framework at 2012-07-13 21:49:31 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2012-07-13 21:35:06	Re: initdb and fsync
Previous Message	Andrew Dunstan	2012-07-13 20:05:37	isolation check takes a long time