Re: High replication lag - Stream Replication

From: Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>
To: Marcelo Kruger <marcelo(dot)kruger(at)neoway(dot)com(dot)br>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: High replication lag - Stream Replication
Date: 2017-11-08 14:09:06
Message-ID: CAGDYbUOP4AAxFAmdm1LpxJDJfGkCzCLoTnosqYksriL=77eSqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Wed, Nov 8, 2017 at 7:08 PM, Marcelo Kruger <marcelo(dot)kruger(at)neoway(dot)com(dot)br
> wrote:

> Good afternoon,
>
> I have two servers hosted on Azure with PostgreSQL 9.6 databases. One
> database is production, the other is standby.
> For replication I use stream replication between servers. However the
> application of the archives in the standby database is slow, always
> generating replication lag.
>
> The replication lag increases considerably when queries are performed on
> the StandBy database. I wonder if there is any setting that can be made to
> improve this performance.
>
> *Server Production*
> SO: CentOS Linux release 7.4.1708 (Core) 64Bits
> CPU: Intel(R) Xeon(R) CPU E5-2698B v3 @ 2.00GHz - 15 Cores
> Memory: 240GB
> I/O: 17TB RAID0 (17 Disks (1TB for disk) SSD with read and write cache
> enabled)
>
> *Server StandBy*
> SO: CentOS Linux release 7.4.1708 (Core) 64Bits
> CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz - 20 Cores
> Memory: 140GB
> I/O: 22TB RAID0 (22 Disks (1TB for disk) SSD with read and write cache
> enabled)
>
> *Configuration PostgreSQL Production*
>
> postgres.conf
>
> listen_addresses = '*'
> port = 5433
> max_connections = 2000
> superuser_reserved_connections = 3
> shared_buffers = 56GB
> work_mem = 29360kB
> maintenance_work_mem = 4GB
> autovacuum_work_mem = 8GB
> max_stack_depth = 5MB
> dynamic_shared_memory_type = posix
> bgwriter_delay = 10ms
> bgwriter_lru_maxpages = 700
> bgwriter_lru_multiplier = 2.0
> fsync = off
> synchronous_commit = on
> full_page_writes = off
> wal_buffers = 1500MB
> wal_writer_delay = 1ms
> checkpoint_timeout = 10min
> max_wal_size = 4GB
> min_wal_size = 2GB
> checkpoint_completion_target = 0.9
> effective_cache_size = 168GB
> default_statistics_target = 500
> log_destination = 'csvlog'
> logging_collector = on
> log_directory = 'pg_log'
> log_filename = 'postgresql-%w.log'
> log_file_mode = 0640
> log_truncate_on_rotation = on
> log_rotation_age = 1d
> log_rotation_size = 600MB
> log_min_duration_statement = 0
> log_checkpoints = off
> log_connections = off
> log_disconnections = off
> log_duration = off
> log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h'
> log_lock_waits = on
> log_timezone = 'Brazil/East'
> autovacuum = on
> log_autovacuum_min_duration = -1
> autovacuum_max_workers = 12
> autovacuum_naptime = 30s
> autovacuum_vacuum_threshold = 20
> autovacuum_analyze_threshold = 20
> autovacuum_vacuum_scale_factor = 0.2
> autovacuum_analyze_scale_factor = 0.1
> autovacuum_vacuum_cost_delay = 20
> autovacuum_vacuum_cost_limit = 800
> vacuum_cost_delay = 20
> vacuum_cost_limit = 800
> datestyle = 'iso, mdy'
> timezone = 'Brazil/East'
> lc_messages = 'en_US.UTF-8'
> lc_monetary = 'en_US.UTF-8'
> lc_numeric = 'en_US.UTF-8'
> lc_time = 'en_US.UTF-8'
> default_text_search_config = 'pg_catalog.english'
> max_locks_per_transaction = 256
> pgpool.pg_ctl = '/usr/pgsql-9.6/bin/pg_ctl'
> effective_io_concurrency = 10
> wal_level = hot_standby
> max_wal_senders = 10
> max_replication_slots = 3
> wal_keep_segments = 38400
>
> *Configuration PostgreSQL StandBy*
>
> postgres.conf
>
> listen_addresses = '*'
> port = 5433
> max_connections = 2000
> superuser_reserved_connections = 3
> shared_buffers = 10GB
> work_mem = 183500kB
> maintenance_work_mem = 4GB
> autovacuum_work_mem = 8GB
> max_stack_depth = 5MB
> dynamic_shared_memory_type = posix
> bgwriter_delay = 10ms
> bgwriter_lru_maxpages = 700
> bgwriter_lru_multiplier = 2.0
> fsync = off
> synchronous_commit = on
> full_page_writes = off
> wal_buffers = 16MB
> wal_writer_delay = 1ms
> checkpoint_timeout = 10min
> max_wal_size = 4GB
> min_wal_size = 2GB
> checkpoint_completion_target = 0.9
> effective_cache_size = 104GB
> default_statistics_target = 500
> log_destination = 'csvlog'
> logging_collector = on
> log_directory = 'pg_log'
> log_filename = 'postgresql-%w.log'
> log_file_mode = 0640
> log_truncate_on_rotation = on
> log_rotation_age = 1d
> log_rotation_size = 600MB
> log_min_duration_statement = 0
> log_checkpoints = off
> log_connections = off
> log_disconnections = off
> log_duration = off
> log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h'
> log_lock_waits = on
> log_timezone = 'Brazil/East'
> autovacuum = on
> log_autovacuum_min_duration = -1
> autovacuum_max_workers = 3
> autovacuum_naptime = 30s
> autovacuum_vacuum_threshold = 20
> autovacuum_analyze_threshold = 20
> autovacuum_vacuum_scale_factor = 0.2
> autovacuum_analyze_scale_factor = 0.1
> autovacuum_vacuum_cost_delay = 20
> autovacuum_vacuum_cost_limit = 800
> vacuum_cost_delay = 20
> vacuum_cost_limit = 800
> datestyle = 'iso, mdy'
> timezone = 'Brazil/East'
> lc_messages = 'en_US.UTF-8'
> lc_monetary = 'en_US.UTF-8'
> lc_numeric = 'en_US.UTF-8'
> lc_time = 'en_US.UTF-8'
> default_text_search_config = 'pg_catalog.english'
> max_locks_per_transaction = 256
> pgpool.pg_ctl = '/usr/pgsql-9.6/bin/pg_ctl'
> effective_io_concurrency = 10
> max_worker_processes = 18
> hot_standby = on
> hot_standby_feedback = on
>
> recovery.conf
>
> standby_mode = 'on'
> primary_slot_name = 'replicacao'
> primary_conninfo = 'host=bdreplica00 port=5433 user=replicator password='
> trigger_file = '/var/lib/pgsql/9.6/data/master_db.conf'
>
> --
>
>
> Marcelo Krüger <https://br.linkedin.com/in/marcelo-kr%C3%BCger-40aab127>
>
> Data Integration
>
> Office: +55 48 3333-2030 | Mobile: +55 48 999679585
>
> R. Patrício Farias, 131 - Itacorubi - Florianópolis - 3º andar | 88034-132
> | Santa Catarina, Brasil
> <https://maps.google.com/?q=R.+Patr%C3%ADcio+Farias,+131+-+Itacorubi+-+Florian%C3%B3polis+-+3%C2%BA+andar+%7C+%E2%80%A888034-132+%7C+Santa+Catarina,+Brasil&entry=gmail&source=g>
>
> São Paulo | Florianópolis | New York
>
> <https://www.linkedin.com/company/1235825/>
> <https://www.facebook.com/neoway.nw/>
>
> Conteúdo confidencial. Caso você não seja o real destinatário deste e-mail
> por favor notifique o remetente e elimine esta mensagem. Privileged and
> confidential. If you are not the intended addressee of this e-mail please
> notify the sender and promptly delete this message.
>
>
Hi Marcelo Kruger,

From server configuration details provided, looks like a heavy trans DB &
more logging enabled which might be resulting more write act on Master and
Slave.
At our first glance, we see *log_min_duration_statement* is set to 0 which
might be writing all the queries into the log resulting heavy writing
activity that maybe cause for replication lag.

Hope this helps you.

--

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Igor Neyman 2017-11-08 15:39:33 Re: How to measure replication lag in logical replication setup.
Previous Message Douglas Reed 2017-11-08 14:06:26 Problems rebuilding slave using pg_basebackup