From: | Greg Stark <stark(at)mit(dot)edu> |
---|---|
To: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: measuring lwlock-related latency spikes |
Date: | 2012-04-01 22:12:05 |
Message-ID: | CAM-w4HP7ux5rNOUfAkLG8Y=dKS+JvZ-E6XroW+kvCqUdrRYAOA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Apr 1, 2012 at 10:27 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> So lock starvation on the control lock would cause a long wait after
> each I/O, making it look like an I/O problem.
Except that both of the locks involved in his smoking gun occur
*after* the control lock has already been acquired. The one that's
actually being blocked for a long time is in fact acquiring a shared
lock which the queue jumping couldn't be hurting.
We know you're convinced about the queue jumping being a problem, and
it's definitely a plausible problem, but I think you need exactly the
kind of instrumentation Robert is doing here to test that theory.
Without it even if everyone agreed it was a real problem we would have
no idea whether a proposed change fixed it.
Fwiw this instrumentation is *amazing*. As a user this kind of rare
random stall is precisely the kind of thing that totally kills me. I
would so much rather run a web site on a database where each query
took twice as long but it guaranteed that no query would take over a
second than one that was twice as fast on average but occasionally
gets stuck for 12s.
--
greg
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2012-04-01 22:34:34 | log chunking broken with large queries under load |
Previous Message | Tom Lane | 2012-04-01 21:51:19 | Re: Speed dblink using alternate libpq tuple storage |