Tuning Checkpoints

From: Andre Henry <ahenry(at)thinkthroughmath(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Tuning Checkpoints
Date: 2016-10-31 19:19:58
Message-ID: AFC9F2D9-5A46-470D-A251-7FD3CEE501DB@thinkthroughmath.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

My PG 9.4.5 server runs on Amazon RDS some times of the day we have a lot of checkpoints really close (less than 1 minute apart, see logs below) and we are trying to tune the DB to minimize the impact of the checkpoint or reduce the number of checkpoints.

Server Stats
· Instance Type db.r3.4xl
• 16 vCPUs 122GB of RAM
• PostgreSQL 9.4.5 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16), 64-bit

Some PG Stats
• Shared Buffers = 31427608kB
• Checkpoint Segments = 64
• Checkpoint completion target = .9
• Rest of the configuration is below

Things we are doing
• We have a huge table where each row is over 1kB and its very busy. We are splitting that into multiple tables especially the one json field that making it large.

Questions
• Each checkpoint log writes out the following checkpoint complete: wrote 166481 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=32.441 s, sync=0.050 s, total=32.550 s; sync files=274, longest=0.049 s, average=0.000 s
• What does buffers mean? How do I find out how much RAM that is equivalent to?
• Based on my RDS stats I don't think IOPs will help, because I don't see any flat lines on my write operations / second graph. Is this a good assumption?
• What else can we tune to spread out checkpoints?

Here is the Postgres configuration

application_name | psql
archive_mode | on
archive_timeout | 5min
autovacuum_analyze_scale_factor | 0.05
autovacuum_naptime | 30s
autovacuum_vacuum_scale_factor | 0.1
checkpoint_completion_target | 0.9
checkpoint_segments | 64
client_encoding | UTF8
effective_cache_size | 62855216kB
fsync | on
full_page_writes | on
hot_standby | off
listen_addresses | *
lo_compat_privileges | off
log_checkpoints | on
log_error_verbosity | default
log_file_mode | 0644
log_hostname | on
log_line_prefix | %t:%r:%u(at)%d:[%p]:
log_rotation_age | 1h
log_timezone | UTC
log_truncate_on_rotation | off
logging_collector | on
maintenance_work_mem | 250MB
max_connections | 4092
max_locks_per_transaction | 64
max_prepared_transactions | 0
max_replication_slots | 5
max_stack_depth | 6MB
max_wal_senders | 10
port | 5432
rds.extensions | btree_gin,btree_gist,chkpass,citext,cube,dblink,dict_int,dict_xsyn,earthdistance,fuzzystrmatch,hstore,intagg,intarray,ip4r,isn,ltree,pgcrypto,pgrowlocks,pgstattuple,pg_buffercache,pg_prewarm,pg_stat_statements,pg_trgm,plcoffee,plls,plperl,plpgsql,pltcl,plv8,postgis,postgis_tiger_geocoder,postgis_topology,postgres_fdw,sslinfo,tablefunc,test_parser,tsearch2,unaccent,uuid-ossp
rds.internal_databases | rdsadmin,template0
rds.rds_superuser_reserved_connections | 0
rds.superuser_variables | session_replication_role
shared_buffers | 31427608kB
ssl | on
ssl_renegotiation_limit | 0
superuser_reserved_connections | 3
synchronous_commit | off
TimeZone | UTC
unix_socket_group | rdsdb
unix_socket_permissions | 0700
wal_keep_segments | 32
wal_level | hot_standby
wal_receiver_timeout | 30s
wal_sender_timeout | 30s
work_mem | 8MB

Here are some checkpoint logs from a busy period

2016-10-26 18:54:32 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:55:10 UTC::@:[5594]:LOG: checkpoint complete: wrote 168357 buffers (4.3%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=37.650 s, sync=0.036 s, total=37.879 s; sync files=266, longest=0.027 s, average=0.000 s
2016-10-26 18:55:15 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:55:53 UTC::@:[5594]:LOG: checkpoint complete: wrote 171087 buffers (4.4%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=38.147 s, sync=0.116 s, total=38.298 s; sync files=276, longest=0.098 s, average=0.000 s
2016-10-26 18:55:58 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:56:34 UTC::@:[5594]:LOG: checkpoint complete: wrote 171131 buffers (4.4%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=35.080 s, sync=0.142 s, total=35.323 s; sync files=274, longest=0.103 s, average=0.000 s
2016-10-26 18:56:39 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:57:14 UTC::@:[5594]:LOG: checkpoint complete: wrote 158818 buffers (4.0%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=35.545 s, sync=0.072 s, total=35.771 s; sync files=277, longest=0.036 s, average=0.000 s
2016-10-26 18:57:19 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:57:56 UTC::@:[5594]:LOG: checkpoint complete: wrote 179405 buffers (4.6%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=36.289 s, sync=0.069 s, total=36.405 s; sync files=295, longest=0.030 s, average=0.000 s
2016-10-26 18:58:01 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:58:37 UTC::@:[5594]:LOG: checkpoint complete: wrote 169711 buffers (4.3%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=35.248 s, sync=0.199 s, total=35.511 s; sync files=295, longest=0.146 s, average=0.000 s
2016-10-26 18:58:42 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:59:17 UTC::@:[5594]:LOG: checkpoint complete: wrote 174124 buffers (4.4%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=34.666 s, sync=0.059 s, total=34.790 s; sync files=279, longest=0.051 s, average=0.000 s
2016-10-26 18:59:23 UTC::@:[5594]:LOG: checkpoint starting: xlog
2016-10-26 18:59:55 UTC::@:[5594]:LOG: checkpoint complete: wrote 166481 buffers (4.2%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=32.441 s, sync=0.050 s, total=32.550 s; sync files=274, longest=0.049 s, average=0.000 s

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2016-10-31 21:59:50 Re: Tuning Checkpoints
Previous Message Joshua D. Drake 2016-10-28 19:33:10 Re: Big Memory Boxes and pgtune