From: | Jim Nasby <jim(at)nasby(dot)net> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org, Greg Smith <greg(at)2ndquadrant(dot)com> |
Subject: | Re: checkpoint patches |
Date: | 2012-03-25 20:29:01 |
Message-ID: | 4F6F800D.8000808@nasby.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 3/23/12 7:38 AM, Robert Haas wrote:
> And here are the latency results for 95th-100th percentile with
> checkpoint_timeout=16min.
>
> ckpt.master.13: 1703, 1830, 2166, 17953, 192434, 43946669
> ckpt.master.14: 1728, 1858, 2169, 15596, 187943, 9619191
> ckpt.master.15: 1700, 1835, 2189, 22181, 206445, 8212125
>
> The picture looks similar here. Increasing checkpoint_timeout isn't
> *quite* as good as spreading out the fsyncs, but it's pretty darn
> close. For example, looking at the median of the three 98th
> percentile numbers for each configuration, the patch bought us a 28%
> improvement in 98th percentile latency. But increasing
> checkpoint_timeout by a minute bought us a 15% improvement in 98th
> percentile latency. So it's still not clear to me that the patch is
> doing anything on this test that you couldn't get just by increasing
> checkpoint_timeout by a few more minutes. Granted, it lets you keep
> your inter-checkpoint interval slightly smaller, but that's not that
> exciting. That having been said, I don't have a whole lot of trouble
> believing that there are other cases where this is more worthwhile.
I wouldn't be too quick to dismiss increasing checkpoint frequency (ie: decreasing checkpoint_timeout).
On a high-value production system you're going to care quite a bit about recovery time. I certainly wouldn't want to run our systems with checkpoint_timeout='15 min' if I could avoid it.
Another $0.02: I don't recall the community using pg_bench much at all to measure latency... I believe it's something fairly new. I point this out because I believe there are differences in analysis that you need to do for TPS vs latency. I think Robert's graphs support my argument; the numeric X-percentile data might not look terribly good, but reducing peak latency from 100ms to 60ms could be a really big deal on a lot of systems. My intuition is that one or both of these patches actually would be valuable in the real world; it would be a shame to throw them out because we're not sure how to performance test them...
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Nasby | 2012-03-25 20:43:51 | Re: COPY / extend ExclusiveLock |
Previous Message | Josh Berkus | 2012-03-25 19:59:30 | Re: who's familiar with the GSOC application process |