From: | Ben Chobot <bench(at)silentmedia(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | in defensive of zone_reclaim_mode on linux |
Date: | 2015-09-04 22:37:47 |
Message-ID: | 0A4AC797-623E-4E2A-819E-FD8B3EB1886B@silentmedia.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Over the last several months, I've seen a lot of grumbling about how zone_reclaim_mode eats babies, kicks puppies, and basically how you should just turn it off and live happily ever after. I thought I should add a counterexample, because that advice has not proven very good for us.
Some facts about us:
- postgres 9.3.9
- ubuntu trusty kernels (3.13.0-29-generic #53~precise1-Ubuntu)
- everything in AWS, on 32-core HVM instances with 60GB of ram
- 6GB shared buffers
- mostly simple queries
Generally, this has worked out pretty well for us. However, we've recently added a bunch more load, which, because we're sharded and each shard has its own user, means we've added more concurrently active users. ("A bunch" = ~300.) We are big pgBouncer users, but because we also use transaction pooling, pgBouncer can only do so much to reuse existing connections. (SET ROLE isn't an option.)
The end result is that recently, we've been running a dumb number of backends (between 600 and 1k) - which are *usually* mostly idle, but there are frequent spikes of activity when dozens of them wake up at onces. Even worse, those spikes tend to also come with connection churn, as pgBouncer tears down existing idle connections to build up new backends for different users.
So our load would hover under 10 most of the time, then spike to over 100 for a minute or two. Connections would get refused, the system would freeze up... and then everything would go back to normal. The solution? Turning on zone_reclaim_mode.
It appears that connection churn is far more manageable to Linux with zone_reclaim_mode enabled. I suspect that our dearth of large, complex queries helps us out as well. Regardless, our systems no longer desperately seek free memory when many idle backends wake up while others are getting torn down and and replaced. Babies and puppies rejoice.
Our situation might not apply to you. But if it does, give zone_reclaim_mode a chance. It's not (always) as bad as others have made it out to be.
From | Date | Subject | |
---|---|---|---|
Next Message | clmartin | 2015-09-04 23:07:02 | Trouble setting up replication |
Previous Message | David G. Johnston | 2015-09-04 21:20:09 | Any thoughts on a better approach to this query? |