Re: postgresql.conf recommendations

From: David Whittaker <dave(at)iradix(dot)com>
To: Josh Krupka <jkrupka(at)gmail(dot)com>
Cc: Johnny Tan <johnnydtan(at)gmail(dot)com>, Alex Kahn <alex(at)paperlesspost(dot)com>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: postgresql.conf recommendations
Date: 2013-02-06 15:42:11
Message-ID: CABXnLXQDT+Y9aYLZMkzXAQaqggLa4OphbjL1i-0vaANb+CNeZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Josh/Johnny,

We've been seeing a similar problem as well, and had also figured THP was
involved. We found this in syslog:
https://gist.github.com/davewhittaker/4723285, which led us to disable THP
2 days ago. At first the results seemed good. In particular, our issues
always seemed interrupt related and the average interrupts/sec immediately
dropped from 7k to around 3k after restarting. The good news is that we
didn't see any spike in system CPU time yesterday. The bad news is that we
did see a spike in app latency that originated from the DB, but now the
spike is in user CPU time and seems to be spread across all of the running
postgres processes. Interrupts still blew up to 21k/sec when it happened.
We are still diagnosing, but I'd be curious to see if either of you get
similar results from turning THP off.

On Tue, Feb 5, 2013 at 11:23 PM, Josh Krupka <jkrupka(at)gmail(dot)com> wrote:

> I've been looking into something on our system that sounds similar to what
> you're seeing. I'm still researching it, but I'm suspecting the memory
> compaction that runs as part of transparent huge pages when memory is
> allocated... yet to be proven. The tunable you mentioned controls the
> compaction process that runs at allocation time so it can try to allocate
> large pages, there's a separate one that controls if the compaction is done
> in khugepaged, and a separate one that controls whether THP is used at all
> or not (/sys/kernel/mm/transparent_hugepage/enabled, or perhaps different
> in your distro)
>
> What's the output of this command?
> egrep 'trans|thp|compact_' /proc/vmstat
> compact_stall represents the number of processes that were stalled to do
> a compaction, the other metrics have to do with other parts of THP. If you
> see compact_stall climbing, from what I can tell those might be causing
> your spikes. I haven't found a way of telling how long the processes have
> been stalled. You could probably get a little more insight into the
> processes with some tracing assuming you can catch it quickly enough.
> Running perf top will also show the compaction happening but that doesn't
> necessarily mean it's impacting your running processes.
>
>
>
>
> On Tue, Feb 5, 2013 at 6:46 PM, Johnny Tan <johnnydtan(at)gmail(dot)com> wrote:
>
>> # cat /sys/kernel/mm/redhat_transparent_hugepage/defrag
>> [always] never
>>
>>
>> On Tue, Feb 5, 2013 at 5:37 PM, Josh Krupka <jkrupka(at)gmail(dot)com> wrote:
>>
>>> Just out of curiosity, are you using transparent huge pages?
>>> On Feb 5, 2013 5:03 PM, "Johnny Tan" <johnnydtan(at)gmail(dot)com> wrote:
>>>
>>>> Server specs:
>>>> Dell R610
>>>> dual E5645 hex-core 2.4GHz
>>>> 192GB RAM
>>>> RAID 1: 2x400GB SSD (OS + WAL logs)
>>>> RAID 10: 4x400GB SSD (/var/lib/pgsql)
>>>>
>>>>
>>>> /etc/sysctl.conf:
>>>> kernel.msgmnb = 65536
>>>> kernel.msgmax = 65536
>>>> kernel.shmmax = 68719476736
>>>> kernel.shmall = 4294967296
>>>> vm.overcommit_memory = 0
>>>> vm.swappiness = 0
>>>> vm.dirty_background_bytes = 536870912
>>>> vm.dirty_bytes = 536870912
>>>>
>>>>
>>>> postgresql.conf:
>>>> listen_addresses = '*' # what IP address(es) to listen on;
>>>> max_connections = 150 # (change requires restart)
>>>> shared_buffers = 48GB # min 128kB
>>>> work_mem = 1310MB # min 64kB
>>>> maintenance_work_mem = 24GB # min 1MB
>>>> wal_level = hot_standby # minimal, archive, or hot_standby
>>>> checkpoint_segments = 64 # in logfile segments, min 1, 16MB each
>>>> checkpoint_timeout = 30min # range 30s-1h
>>>> checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 -
>>>> 1.0
>>>> max_wal_senders = 5 # max number of walsender processes
>>>> wal_keep_segments = 2000 # in logfile segments, 16MB each; 0 disables
>>>> hot_standby = on # "on" allows queries during recovery
>>>> max_standby_archive_delay = 120s # max delay before canceling queries
>>>> max_standby_streaming_delay = 120s # max delay before canceling queries
>>>> effective_cache_size = 120GB
>>>> constraint_exclusion = partition # on, off, or partition
>>>> log_destination = 'syslog' # Valid values are combinations of
>>>> logging_collector = on # Enable capturing of stderr and csvlog
>>>> log_directory = 'pg_log' # directory where log files are written,
>>>> log_filename = 'postgresql-%a.log' # log file name pattern,
>>>> log_truncate_on_rotation = on # If on, an existing log file with the
>>>> log_rotation_age = 1d # Automatic rotation of logfiles will
>>>> log_rotation_size = 0 # Automatic rotation of logfiles will
>>>> log_min_duration_statement = 500 # -1 is disabled, 0 logs all
>>>> statements
>>>> log_checkpoints = on
>>>> log_line_prefix = 'user=%u db=%d remote=%r ' # special values:
>>>> log_lock_waits = on # log lock waits >= deadlock_timeout
>>>> autovacuum = on # Enable autovacuum subprocess? 'on'
>>>> log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and
>>>> autovacuum_max_workers = 5 # max number of autovacuum subprocesses
>>>> datestyle = 'iso, mdy'
>>>> lc_messages = 'en_US.UTF-8' # locale for system error message
>>>> lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
>>>> lc_numeric = 'en_US.UTF-8' # locale for number formatting
>>>> lc_time = 'en_US.UTF-8' # locale for time formatting
>>>> default_text_search_config = 'pg_catalog.english'
>>>> deadlock_timeout = 300ms
>>>>
>>>>
>>>> per pgtune:
>>>>
>>>> #------------------------------------------------------------------------------
>>>> # pgtune wizard run on 2013-02-05
>>>> # Based on 198333224 KB RAM in the server
>>>>
>>>> #------------------------------------------------------------------------------
>>>> default_statistics_target = 100
>>>> maintenance_work_mem = 1GB
>>>> checkpoint_completion_target = 0.9
>>>> effective_cache_size = 128GB
>>>> work_mem = 1152MB
>>>> wal_buffers = 8MB
>>>> checkpoint_segments = 16
>>>> shared_buffers = 44GB
>>>> max_connections = 80
>>>>
>>>> We use pgbouncer (set to 140 connections) in transaction pooling mode
>>>> in front of our db.
>>>>
>>>>
>>>> The problem:
>>>>
>>>> For the most part, the server hums along. No other applications run on
>>>> this server other than postgres. Load averages rarely break 2.0, it never
>>>> swaps, and %iowait is usually not more than 0.12
>>>>
>>>> But periodically, there are spikes in our app's db response time.
>>>> Normally, the app's db response time hovers in the 100ms range for most of
>>>> the day. During the spike times, it can go up to 1000ms or 1500ms, and the
>>>> number of pg connections goes to 140 (maxed out to pgbouncer's limit, where
>>>> normally it's only about 20-40 connections). Also, during these times,
>>>> which usually last less than 2 minutes, we will see several thousand
>>>> queries in the pg log (this is with log_min_duration_statement = 500),
>>>> compared to maybe one or two dozen 500ms+ queries in non-spike times.
>>>>
>>>> Inbetween spikes could be an hour, two hours, sometimes half a day.
>>>> There doesn't appear to be any pattern that we can see:
>>>> * there are no obvious queries that are locking the db
>>>> * it doesn't necessarily happen during high-traffic times, though it can
>>>> * it doesn't happen during any known system, db, or app
>>>> regularly-scheduled job, including crons
>>>> * in general, there's no discernible regularity to it at all
>>>> * it doesn't coincide with checkpoint starts or completions
>>>> * it doesn't coincide with autovacuums
>>>> * there are no messages in any system logs that might indicate any
>>>> system or hardware-related issue
>>>>
>>>> Besides spikes in our graphs, the only other visible effect is that
>>>> %system in sar goes from average of 0.7 to as high as 10.0 or so (%iowait
>>>> and all other sar variables remain the same).
>>>>
>>>> And according to our monitoring system, web requests get queued up, and
>>>> our alerting system sometimes either says there's a timeout or that it had
>>>> multiple web response times greater than 300ms, and so we suspect (but have
>>>> no proof) that some users will see either a long hang or possibly a
>>>> timeout. But since it's almost always less than two minutes, and sometimes
>>>> less than one, we don't really hear any complaints (guessing that most
>>>> people hit reload, and things work again, so they continue on), and we
>>>> haven't been able to see any negative effect ourselves.
>>>>
>>>> But we want to get in front of the problem, in case it is something
>>>> that will get worse as traffic continues to grow. We've tweaked various
>>>> configs on the OS side as well as the postgresql.conf side. What's posted
>>>> above is our current setup, and the problem persists.
>>>>
>>>> Any ideas as to where we could even look?
>>>>
>>>> Also, whether related or unrelated to the spikes, are there any
>>>> recommendations for our postgresql.conf or sysctl.conf based on our
>>>> hardware? From pgtune's output, I am lowering maintenance_work_mem from
>>>> 24GB down to maybe 2GB, but I keep reading conflicting things about other
>>>> settings, such as checkpoints or max_connections.
>>>>
>>>> johnny
>>>>
>>>>
>>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2013-02-06 16:22:40 Re: Slow Query Help
Previous Message Will Platnick 2013-02-06 13:17:50 Re: Slow Query Help