From: | Boszormenyi Zoltan <zb(at)cybertec(dot)at> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Marc Cousin <cousinmarc(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>, Ants Aasma <ants(at)cybertec(dot)at>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: [PATCH] lock_timeout and common SIGALRM framework |
Date: | 2012-07-13 20:32:26 |
Message-ID: | 500085DA.5050908@cybertec.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
2012-07-12 19:05 keltezéssel, Tom Lane írta:
> Here is a revised version of the timeout-infrastructure patch.
> I whacked it around quite a bit, notably:
>
> * I decided that the most convenient way to handle the initialization
> issue was to combine establishment of the signal handler with resetting
> of the per-process variables. So handle_sig_alarm is no longer global,
> and InitializeTimeouts is called at the places where we used to do
> "pqsignal(SIGALRM, handle_sig_alarm);". I believe this is correct
> because any subprocess that was intending to use SIGALRM must have
> called that before establishing any timeouts.
OK.
> * BTW, doing that exposed the fact that walsender processes were failing
> to establish a SIGALRM signal handler, which is a pre-existing bug,
> because they run a normal authentication transaction during InitPostgres
> and hence need to be able to cope with deadlock and statement timeouts.
> I will do something about back-patching a fix for that.
Wow, my work uncovered a pre-existing bug in PostgreSQL. :-)
> * I ended up putting the RegisterTimeout calls for DEADLOCK_TIMEOUT
> and STATEMENT_TIMEOUT into InitPostgres, ensuring that they'd get
> done in walsender and autovacuum processes. I'm not totally satisfied
> with that, but for sure it didn't work to only establish them in
> regular backends.
>
> * I didn't like the "TimeoutName" nomenclature, because to me "name"
> suggests that the value is a string, not just an enum. So I renamed
> that to TimeoutId.
OK.
> * I whacked around the logic in timeout.c a fair amount, because it
> had race conditions if SIGALRM happened while enabling or disabling
> a timeout. I believe the attached coding is safe, but I'm not totally
> happy with it from a performance standpoint, because it will do two
> setitimer calls (a disable and then a re-enable) in many cases where
> the old code did only one.
Disabling deadlock timeout, a.k.a. disable_sig_alarm(false) in
the original code called setitimer() twice if statement timeout
was still in effect, it was done in CheckStatementTimeout().
Considering this, I think there is no performance problem with
the new code you came up with.
> I think what we ought to do is go ahead and apply this, so that we
> can have the API nailed down, and then we can revisit the internals
> of timeout.c to see if we can't get the performance back up.
> It's clearly a much cleaner design than the old spaghetti logic in
> proc.c, so I think we ought to go ahead with this independently of
> whether the second patch gets accepted.
There is one tiny bit you might have broken. You wrote previously:
> I am also underwhelmed by the "timeout_start" callback function concept.
> In the first place, that's broken enable_timeout, which incorrectly
> assumes that the value it gets must be "now" (see its schedule_alarm
> call). In the second place, it seems fairly likely that callers of
> get_timeout_start would likewise want the clock time at which the
> timeout was enabled, not the timeout_start reference time. (If they
> did want the latter, why couldn't they get it from wherever the callback
> function had gotten it?) I'm inclined to propose that we drop the
> timeout_start concept and instead provide two functions for scheduling
> interrupts:
>
> enable_timeout_after(TimeoutName tn, int delay_ms);
> enable_timeout_at(TimeoutName tn, TimestampTz fin_time);
>
> where you use the former if you want the standard GetCurrentTimestamp +
> n msec calculation, but if you want the stop time calculated in some
> other way, you calculate it yourself and use the second function.
It's okay, but you haven't really followed it with STATEMENT_TIMEOUT:
-----8<----------8<----------8<----------8<----------8<-----
*** 2396,2404 ****
/* Set statement timeout running, if any */
/* NB: this mustn't be enabled until we are within an xact */
if (StatementTimeout > 0)
! enable_sig_alarm(StatementTimeout, true);
else
! cancel_from_timeout = false;
xact_started = true;
}
--- 2397,2405 ----
/* Set statement timeout running, if any */
/* NB: this mustn't be enabled until we are within an xact */
if (StatementTimeout > 0)
! enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout);
else
! disable_timeout(STATEMENT_TIMEOUT, false);
xact_started = true;
}
-----8<----------8<----------8<----------8<----------8<-----
It means that StatementTimeout losts its precision. It would trigger
in the future counting from "now" instead of counting from
GetCurrentStatementStartTimestamp(). It should be:
enable_timeout_at(STATEMENT_TIMEOUT,
TimestampTzPlusMilliseconds(GetCurrentStatementStartTimestamp(), StatementTimeout));
> I haven't really looked at the second patch yet, but at minimum that
> will need some rebasing to match the API tweaks here.
Yes, I will do that.
Thanks for your review and work.
Best regards,
Zoltán Böszörményi
--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
http://www.postgresql.at/
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2012-07-13 21:35:06 | Re: initdb and fsync |
Previous Message | Andrew Dunstan | 2012-07-13 20:05:37 | isolation check takes a long time |