Re: postgresql.conf recommendations

From: Josh Krupka <jkrupka(at)gmail(dot)com>
To: Johnny Tan <johnnydtan(at)gmail(dot)com>
Cc: Alex Kahn <alex(at)paperlesspost(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: postgresql.conf recommendations
Date: 2013-02-05 22:37:21
Message-ID: CAB6McgXMFS2XD_inm5LMSmZpYMoXm8mzF6t3JXVyJAqi_-98OA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Just out of curiosity, are you using transparent huge pages?
On Feb 5, 2013 5:03 PM, "Johnny Tan" <johnnydtan(at)gmail(dot)com> wrote:

> Server specs:
> Dell R610
> dual E5645 hex-core 2.4GHz
> 192GB RAM
> RAID 1: 2x400GB SSD (OS + WAL logs)
> RAID 10: 4x400GB SSD (/var/lib/pgsql)
>
>
> /etc/sysctl.conf:
> kernel.msgmnb = 65536
> kernel.msgmax = 65536
> kernel.shmmax = 68719476736
> kernel.shmall = 4294967296
> vm.overcommit_memory = 0
> vm.swappiness = 0
> vm.dirty_background_bytes = 536870912
> vm.dirty_bytes = 536870912
>
>
> postgresql.conf:
> listen_addresses = '*' # what IP address(es) to listen on;
> max_connections = 150 # (change requires restart)
> shared_buffers = 48GB # min 128kB
> work_mem = 1310MB # min 64kB
> maintenance_work_mem = 24GB # min 1MB
> wal_level = hot_standby # minimal, archive, or hot_standby
> checkpoint_segments = 64 # in logfile segments, min 1, 16MB each
> checkpoint_timeout = 30min # range 30s-1h
> checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
> max_wal_senders = 5 # max number of walsender processes
> wal_keep_segments = 2000 # in logfile segments, 16MB each; 0 disables
> hot_standby = on # "on" allows queries during recovery
> max_standby_archive_delay = 120s # max delay before canceling queries
> max_standby_streaming_delay = 120s # max delay before canceling queries
> effective_cache_size = 120GB
> constraint_exclusion = partition # on, off, or partition
> log_destination = 'syslog' # Valid values are combinations of
> logging_collector = on # Enable capturing of stderr and csvlog
> log_directory = 'pg_log' # directory where log files are written,
> log_filename = 'postgresql-%a.log' # log file name pattern,
> log_truncate_on_rotation = on # If on, an existing log file with the
> log_rotation_age = 1d # Automatic rotation of logfiles will
> log_rotation_size = 0 # Automatic rotation of logfiles will
> log_min_duration_statement = 500 # -1 is disabled, 0 logs all statements
> log_checkpoints = on
> log_line_prefix = 'user=%u db=%d remote=%r ' # special values:
> log_lock_waits = on # log lock waits >= deadlock_timeout
> autovacuum = on # Enable autovacuum subprocess? 'on'
> log_autovacuum_min_duration = 0 # -1 disables, 0 logs all actions and
> autovacuum_max_workers = 5 # max number of autovacuum subprocesses
> datestyle = 'iso, mdy'
> lc_messages = 'en_US.UTF-8' # locale for system error message
> lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
> lc_numeric = 'en_US.UTF-8' # locale for number formatting
> lc_time = 'en_US.UTF-8' # locale for time formatting
> default_text_search_config = 'pg_catalog.english'
> deadlock_timeout = 300ms
>
>
> per pgtune:
>
> #------------------------------------------------------------------------------
> # pgtune wizard run on 2013-02-05
> # Based on 198333224 KB RAM in the server
>
> #------------------------------------------------------------------------------
> default_statistics_target = 100
> maintenance_work_mem = 1GB
> checkpoint_completion_target = 0.9
> effective_cache_size = 128GB
> work_mem = 1152MB
> wal_buffers = 8MB
> checkpoint_segments = 16
> shared_buffers = 44GB
> max_connections = 80
>
> We use pgbouncer (set to 140 connections) in transaction pooling mode in
> front of our db.
>
>
> The problem:
>
> For the most part, the server hums along. No other applications run on
> this server other than postgres. Load averages rarely break 2.0, it never
> swaps, and %iowait is usually not more than 0.12
>
> But periodically, there are spikes in our app's db response time.
> Normally, the app's db response time hovers in the 100ms range for most of
> the day. During the spike times, it can go up to 1000ms or 1500ms, and the
> number of pg connections goes to 140 (maxed out to pgbouncer's limit, where
> normally it's only about 20-40 connections). Also, during these times,
> which usually last less than 2 minutes, we will see several thousand
> queries in the pg log (this is with log_min_duration_statement = 500),
> compared to maybe one or two dozen 500ms+ queries in non-spike times.
>
> Inbetween spikes could be an hour, two hours, sometimes half a day. There
> doesn't appear to be any pattern that we can see:
> * there are no obvious queries that are locking the db
> * it doesn't necessarily happen during high-traffic times, though it can
> * it doesn't happen during any known system, db, or app
> regularly-scheduled job, including crons
> * in general, there's no discernible regularity to it at all
> * it doesn't coincide with checkpoint starts or completions
> * it doesn't coincide with autovacuums
> * there are no messages in any system logs that might indicate any system
> or hardware-related issue
>
> Besides spikes in our graphs, the only other visible effect is that
> %system in sar goes from average of 0.7 to as high as 10.0 or so (%iowait
> and all other sar variables remain the same).
>
> And according to our monitoring system, web requests get queued up, and
> our alerting system sometimes either says there's a timeout or that it had
> multiple web response times greater than 300ms, and so we suspect (but have
> no proof) that some users will see either a long hang or possibly a
> timeout. But since it's almost always less than two minutes, and sometimes
> less than one, we don't really hear any complaints (guessing that most
> people hit reload, and things work again, so they continue on), and we
> haven't been able to see any negative effect ourselves.
>
> But we want to get in front of the problem, in case it is something that
> will get worse as traffic continues to grow. We've tweaked various configs
> on the OS side as well as the postgresql.conf side. What's posted above is
> our current setup, and the problem persists.
>
> Any ideas as to where we could even look?
>
> Also, whether related or unrelated to the spikes, are there any
> recommendations for our postgresql.conf or sysctl.conf based on our
> hardware? From pgtune's output, I am lowering maintenance_work_mem from
> 24GB down to maybe 2GB, but I keep reading conflicting things about other
> settings, such as checkpoints or max_connections.
>
> johnny
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Johnny Tan 2013-02-05 23:46:37 Re: postgresql.conf recommendations
Previous Message Johnny Tan 2013-02-05 22:02:21 postgresql.conf recommendations