From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Spin Lock sleep resolution |
Date: | 2013-04-02 03:07:40 |
Message-ID: | CAMkU=1xhp090HA-sdRmu=RwUtYRa66t4jCnUG2K2dhvF1EiZ5w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
While stracing a process that was contending for a spinlock, I noticed that
the sleep time would often have a longish sequence of 1 and 2 msec sleep
times, rather than the rapidly but randomly increasing sleep time intended.
(At first it looked like it was sleeping on a different attempt each time,
rather than re-sleeping, but some additional logging showed this was not
the case, it was re-sleeping on the same lock attempt.)
The problem is that the state is maintained only to an integer number of
milliseconds starting at 1, so it can take a number of attempts for the
random increment to jump from 1 to 2, and then from 2 to 3.
If the state was instead maintained to an integer number of microseconds
starting at 1000, this granularity of the jump would not be a problem.
And once the state is kept to microseconds, I see no point in not passing
the microsecond precision to pg_usleep.
The attached patch changes the resolution of the state variable to
microseconds, but keeps the starting value at 1msec, i.e. 1000 usec.
Arguably the MIN_DELAY_USEC value should be reduced from 1000 as well given
current hardware, but I'll leave that as a separate issue.
There is the comment:
* The pg_usleep() delays are measured in milliseconds because 1
msec is a
* common resolution limit at the OS level for newer platforms. On
older
* platforms the resolution limit is usually 10 msec, in which case
the
* total delay before timeout will be a bit more.
I think the first sentence is out of date (CentOS 6.3 on "Intel(R) Xeon(R)
CPU E5-2650 0 @ 2.00GHz" not exactly bleeding edge, seems to have a
resolution of 100 usec), and also somewhat bogus (what benefit do we get by
preemptively rounding to some level just because some common platforms will
do so anyway?). I've just removed the whole comment, though maybe some
version of the last sentence needs to be put back.
I have no evidence that the current granularity of the random escalation is
actually causing performance problems, only that is causes confusion
problems for people following along. But I think the change can be
justified based purely on it being cleaner code and more future proof.
This could be made obsolete by changing to futexes, but I doubt that that
is going to happen soon, and certainly not for all platforms.
I will add to commitfest Next
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2013-04-02 04:23:42 | Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL) |
Previous Message | Jeff Janes | 2013-04-02 03:02:20 | Re: regression test failed when enabling checksum |