From: | Shaun Thomas <sthomas(at)optionshouse(dot)com> |
---|---|
To: | "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org> |
Subject: | PSA: New Kernels and intel_idle cpuidle Driver! |
Date: | 2012-10-26 17:58:57 |
Message-ID: | 508ACF61.1000602@optionshouse.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
Hey guys,
I have a pretty nasty heads-up. If you have hardware using an Intel XEON
and a newer Linux kernel, you may be experiencing very high CPU latency.
You can check yourself:
cat /sys/devices/system/cpu/cpuidle/current_driver
If it says intel_idle, the Linux kernel will *aggressively* put your CPU
to sleep. We definitely noticed this, and it's pretty darn painful. But
it's *more* painful in your asynchronous, standby, or otherwise less
busy nodes. Why?
As you can imagine, the secondary nodes don't get much activity, so
spend most of their time sleeping. Now the CPU has a lot more sleep
time, and wake latency while trying to copy data or process new WAL traffic.
To fix this, you must actually hint to, or outright disable, the driver
by picking your own C-state, probably the one you wanted in the BIOS in
the first place. We did this by adding the following options to
GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub, but your distro may differ.
intel_idle.max_cstate=0 processor.max_cstate=0 idle=mwait
Then reboot. Here are the benefits we got:
* %util difference between backing device and DRBD went down by 30-40%
on our replicating nodes.
* TCP RTT is almost 10x faster.
I'm totally not kidding about that last one. Due to the time necessary
to wake a CPU to handle the network traffic, latency was massively
increased using the intel_idle driver. Our RTT average was 0.375ms on a
10G link before. Now it's 0.04ms after using the settings above.
Consider this a PSA. DRBD is unfairly being blamed for bad performance
with the intel_idle cpuidle driver in newer kernels! If you have DRBD on
a newer Intel system, I highly recommend you make the above changes,
especially since it directly affects your replication speed.
It took us days to figure this out, so I figured I'd share.
Thanks, everyone!
--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas(at)optionshouse(dot)com
______________________________________________
See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2012-10-26 18:00:55 | Re: Query-Planer from 6seconds TO DAYS |
Previous Message | Jeff Janes | 2012-10-26 17:58:19 | Re: Query-Planer from 6seconds TO DAYS |