Re: [ADMIN] High replication lag - Stream Replication

From: Marcelo Kruger <marcelo(dot)kruger(at)neoway(dot)com(dot)br>
To: Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: [ADMIN] High replication lag - Stream Replication
Date: 2017-12-10 21:59:36
Message-ID: CAN8toQP9Oyh5r64vK7GazWP=9SKjBjRAgZo4pXvFvf_UE9vynA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

We were able to reduce the volume of generated WAL files. In some cases we
use UNLOGGED tables to assist in WAL reduction. But I still find it strange
the process of recovery in the standby database use little CPU. Today the
process uses between 20 and 50% of a CPU, not using all the processing that
the machine offers (20 CPUs, RAID 0 on SSD).

Is there any configuration that allows Postgres to use more processing at
the time of the application of WAL files in the StandBy database?

File configuration Postgres in StandBy

listen_addresses = '*'
port = 5433
max_connections = 2000
superuser_reserved_connections = 3
shared_buffers = 10GB
work_mem = 183500kB
maintenance_work_mem = 4GB
autovacuum_work_mem = 8GB
max_stack_depth = 5MB
dynamic_shared_memory_type = posix
bgwriter_delay = 10ms
bgwriter_lru_maxpages = 700
bgwriter_lru_multiplier = 2.0
fsync = off
synchronous_commit = on
full_page_writes = off
wal_buffers = 16MB
wal_writer_delay = 1ms
checkpoint_timeout = 10min
max_wal_size = 4GB
min_wal_size = 2GB
checkpoint_completion_target = 0.9
effective_cache_size = 104GB
default_statistics_target = 500
log_destination = 'csvlog'
logging_collector = on
log_directory = 'pg_log'
log_filename = 'postgresql-%w.log'
log_file_mode = 0640
log_truncate_on_rotation = on
log_rotation_age = 1d
log_rotation_size = 600MB
log_min_duration_statement = 0
log_checkpoints = off
log_connections = off
log_disconnections = off
log_duration = off
log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h'
log_lock_waits = on
log_timezone = 'Brazil/East'
autovacuum = on
log_autovacuum_min_duration = -1
autovacuum_max_workers = 3
autovacuum_naptime = 30s
autovacuum_vacuum_threshold = 20
autovacuum_analyze_threshold = 20
autovacuum_vacuum_scale_factor = 0.2
autovacuum_analyze_scale_factor = 0.1
autovacuum_vacuum_cost_delay = 20
autovacuum_vacuum_cost_limit = 800
vacuum_cost_delay = 20
vacuum_cost_limit = 800
datestyle = 'iso, mdy'
timezone = 'Brazil/East'
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
default_text_search_config = 'pg_catalog.english'
max_locks_per_transaction = 256
pgpool.pg_ctl = '/usr/pgsql-9.6/bin/pg_ctl'
effective_io_concurrency = 10
max_worker_processes = 18
hot_standby = on

2017-11-13 10:59 GMT-02:00 Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>:

>
>
> On Thu, Nov 9, 2017 at 5:36 PM, Marcelo Kruger <
> marcelo(dot)kruger(at)neoway(dot)com(dot)br> wrote:
>
>> Hi Shreeyansh,
>>
>> I changed the log_min_duration_statement parameter to -1, but there was
>> no improvement in the database. The application of the archives by Stream
>> Replication is still slow.
>>
>> In my usage scenario we performed large table creation operations. We
>> create several tables in a short time with more than 40GB. We currently
>> have a 500GB growth per month in the database.
>>
>> Transactions in one day can generate more than 30000 archives (480GB).
>>
>> Are there any other parameters that can be changed to improve performance
>> in the application of archives?
>>
>> 2017-11-09 2:43 GMT-02:00 Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>:
>>
>>> On Wed, Nov 8, 2017 at 9:34 PM, Marcelo Kruger <
>>> marcelo(dot)kruger(at)neoway(dot)com(dot)br> wrote:
>>>
>>>> Hi Shreeyansh,
>>>>
>>>> Thank you for your help.
>>>>
>>>> The lag is not occurring in the recording of the archive in production,
>>>> nor in sending to StandBy. The lag is occurring in the archive application
>>>> in StandBy. And this slowness increases with running queries in the StandBy
>>>> database.
>>>>
>>>> The process of recovering is consuming little CPU, and little I / O.
>>>>
>>>> postgres: startup process recovering 0000000100001B4300000006
>>>>
>>>> This parameter *log_min_duration_statement *should be changed on the
>>>> StandBy server?
>>>>
>>>
>>> The parameter should be changed On both master as well as slave as this
>>> result towards heavy load on the server causing slower execution of the
>>> queries due to higher resource consumption.
>>>
>>> This parameter normally set capturing bottleneck queries by setting the
>>> higher values to minimize the write activities and load on the servers.
>>>
>>>
>>>
>>>
>>>>
>>>> 2017-11-08 12:09 GMT-02:00 Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>:
>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 8, 2017 at 7:08 PM, Marcelo Kruger <
>>>>> marcelo(dot)kruger(at)neoway(dot)com(dot)br> wrote:
>>>>>
>>>>>> Good afternoon,
>>>>>>
>>>>>> I have two servers hosted on Azure with PostgreSQL 9.6 databases. One
>>>>>> database is production, the other is standby.
>>>>>> For replication I use stream replication between servers. However the
>>>>>> application of the archives in the standby database is slow, always
>>>>>> generating replication lag.
>>>>>>
>>>>>> The replication lag increases considerably when queries are performed
>>>>>> on the StandBy database. I wonder if there is any setting that can be made
>>>>>> to improve this performance.
>>>>>>
>>>>>> *Server Production*
>>>>>> SO: CentOS Linux release 7.4.1708 (Core) 64Bits
>>>>>> CPU: Intel(R) Xeon(R) CPU E5-2698B v3 @ 2.00GHz - 15 Cores
>>>>>> Memory: 240GB
>>>>>> I/O: 17TB RAID0 (17 Disks (1TB for disk) SSD with read and write
>>>>>> cache enabled)
>>>>>>
>>>>>> *Server StandBy*
>>>>>> SO: CentOS Linux release 7.4.1708 (Core) 64Bits
>>>>>> CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz - 20 Cores
>>>>>> Memory: 140GB
>>>>>> I/O: 22TB RAID0 (22 Disks (1TB for disk) SSD with read and write
>>>>>> cache enabled)
>>>>>>
>>>>>> *Configuration PostgreSQL Production*
>>>>>>
>>>>>> postgres.conf
>>>>>>
>>>>>> listen_addresses = '*'
>>>>>> port = 5433
>>>>>> max_connections = 2000
>>>>>> superuser_reserved_connections = 3
>>>>>> shared_buffers = 56GB
>>>>>> work_mem = 29360kB
>>>>>> maintenance_work_mem = 4GB
>>>>>> autovacuum_work_mem = 8GB
>>>>>> max_stack_depth = 5MB
>>>>>> dynamic_shared_memory_type = posix
>>>>>> bgwriter_delay = 10ms
>>>>>> bgwriter_lru_maxpages = 700
>>>>>> bgwriter_lru_multiplier = 2.0
>>>>>> fsync = off
>>>>>> synchronous_commit = on
>>>>>> full_page_writes = off
>>>>>> wal_buffers = 1500MB
>>>>>> wal_writer_delay = 1ms
>>>>>> checkpoint_timeout = 10min
>>>>>> max_wal_size = 4GB
>>>>>> min_wal_size = 2GB
>>>>>> checkpoint_completion_target = 0.9
>>>>>> effective_cache_size = 168GB
>>>>>> default_statistics_target = 500
>>>>>> log_destination = 'csvlog'
>>>>>> logging_collector = on
>>>>>> log_directory = 'pg_log'
>>>>>> log_filename = 'postgresql-%w.log'
>>>>>> log_file_mode = 0640
>>>>>> log_truncate_on_rotation = on
>>>>>> log_rotation_age = 1d
>>>>>> log_rotation_size = 600MB
>>>>>> log_min_duration_statement = 0
>>>>>> log_checkpoints = off
>>>>>> log_connections = off
>>>>>> log_disconnections = off
>>>>>> log_duration = off
>>>>>> log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h'
>>>>>> log_lock_waits = on
>>>>>> log_timezone = 'Brazil/East'
>>>>>> autovacuum = on
>>>>>> log_autovacuum_min_duration = -1
>>>>>> autovacuum_max_workers = 12
>>>>>> autovacuum_naptime = 30s
>>>>>> autovacuum_vacuum_threshold = 20
>>>>>> autovacuum_analyze_threshold = 20
>>>>>> autovacuum_vacuum_scale_factor = 0.2
>>>>>> autovacuum_analyze_scale_factor = 0.1
>>>>>> autovacuum_vacuum_cost_delay = 20
>>>>>> autovacuum_vacuum_cost_limit = 800
>>>>>> vacuum_cost_delay = 20
>>>>>> vacuum_cost_limit = 800
>>>>>> datestyle = 'iso, mdy'
>>>>>> timezone = 'Brazil/East'
>>>>>> lc_messages = 'en_US.UTF-8'
>>>>>> lc_monetary = 'en_US.UTF-8'
>>>>>> lc_numeric = 'en_US.UTF-8'
>>>>>> lc_time = 'en_US.UTF-8'
>>>>>> default_text_search_config = 'pg_catalog.english'
>>>>>> max_locks_per_transaction = 256
>>>>>> pgpool.pg_ctl = '/usr/pgsql-9.6/bin/pg_ctl'
>>>>>> effective_io_concurrency = 10
>>>>>> wal_level = hot_standby
>>>>>> max_wal_senders = 10
>>>>>> max_replication_slots = 3
>>>>>> wal_keep_segments = 38400
>>>>>>
>>>>>> *Configuration PostgreSQL StandBy*
>>>>>>
>>>>>> postgres.conf
>>>>>>
>>>>>> listen_addresses = '*'
>>>>>> port = 5433
>>>>>> max_connections = 2000
>>>>>> superuser_reserved_connections = 3
>>>>>> shared_buffers = 10GB
>>>>>> work_mem = 183500kB
>>>>>> maintenance_work_mem = 4GB
>>>>>> autovacuum_work_mem = 8GB
>>>>>> max_stack_depth = 5MB
>>>>>> dynamic_shared_memory_type = posix
>>>>>> bgwriter_delay = 10ms
>>>>>> bgwriter_lru_maxpages = 700
>>>>>> bgwriter_lru_multiplier = 2.0
>>>>>> fsync = off
>>>>>> synchronous_commit = on
>>>>>> full_page_writes = off
>>>>>> wal_buffers = 16MB
>>>>>> wal_writer_delay = 1ms
>>>>>> checkpoint_timeout = 10min
>>>>>> max_wal_size = 4GB
>>>>>> min_wal_size = 2GB
>>>>>> checkpoint_completion_target = 0.9
>>>>>> effective_cache_size = 104GB
>>>>>> default_statistics_target = 500
>>>>>> log_destination = 'csvlog'
>>>>>> logging_collector = on
>>>>>> log_directory = 'pg_log'
>>>>>> log_filename = 'postgresql-%w.log'
>>>>>> log_file_mode = 0640
>>>>>> log_truncate_on_rotation = on
>>>>>> log_rotation_age = 1d
>>>>>> log_rotation_size = 600MB
>>>>>> log_min_duration_statement = 0
>>>>>> log_checkpoints = off
>>>>>> log_connections = off
>>>>>> log_disconnections = off
>>>>>> log_duration = off
>>>>>> log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h'
>>>>>> log_lock_waits = on
>>>>>> log_timezone = 'Brazil/East'
>>>>>> autovacuum = on
>>>>>> log_autovacuum_min_duration = -1
>>>>>> autovacuum_max_workers = 3
>>>>>> autovacuum_naptime = 30s
>>>>>> autovacuum_vacuum_threshold = 20
>>>>>> autovacuum_analyze_threshold = 20
>>>>>> autovacuum_vacuum_scale_factor = 0.2
>>>>>> autovacuum_analyze_scale_factor = 0.1
>>>>>> autovacuum_vacuum_cost_delay = 20
>>>>>> autovacuum_vacuum_cost_limit = 800
>>>>>> vacuum_cost_delay = 20
>>>>>> vacuum_cost_limit = 800
>>>>>> datestyle = 'iso, mdy'
>>>>>> timezone = 'Brazil/East'
>>>>>> lc_messages = 'en_US.UTF-8'
>>>>>> lc_monetary = 'en_US.UTF-8'
>>>>>> lc_numeric = 'en_US.UTF-8'
>>>>>> lc_time = 'en_US.UTF-8'
>>>>>> default_text_search_config = 'pg_catalog.english'
>>>>>> max_locks_per_transaction = 256
>>>>>> pgpool.pg_ctl = '/usr/pgsql-9.6/bin/pg_ctl'
>>>>>> effective_io_concurrency = 10
>>>>>> max_worker_processes = 18
>>>>>> hot_standby = on
>>>>>> hot_standby_feedback = on
>>>>>>
>>>>>> recovery.conf
>>>>>>
>>>>>> standby_mode = 'on'
>>>>>> primary_slot_name = 'replicacao'
>>>>>> primary_conninfo = 'host=bdreplica00 port=5433 user=replicator
>>>>>> password='
>>>>>> trigger_file = '/var/lib/pgsql/9.6/data/master_db.conf'
>>>>>>
>>>>>> --
>>>>>>
>>>>>>
>>>>>> Marcelo Krüger
>>>>>> <https://br.linkedin.com/in/marcelo-kr%C3%BCger-40aab127>
>>>>>>
>>>>>> Data Integration
>>>>>>
>>>>>> Office: +55 48 3333-2030 <+55%2048%203333-2030> | Mobile: +55 48
>>>>>> 999679585 <+55%2048%2099967-9585>
>>>>>>
>>>>>> R. Patrício Farias, 131 - Itacorubi - Florianópolis - 3º andar |
>>>>>> 88034-132 | Santa Catarina, Brasil
>>>>>> <https://maps.google.com/?q=R.+Patr%C3%ADcio+Farias,+131+-+Itacorubi+-+Florian%C3%B3polis+-+3%C2%BA+andar+%7C+%E2%80%A888034-132+%7C+Santa+Catarina,+Brasil&entry=gmail&source=g>
>>>>>>
>>>>>> São Paulo | Florianópolis | New York
>>>>>>
>>>>>> <https://www.linkedin.com/company/1235825/>
>>>>>> <https://www.facebook.com/neoway.nw/>
>>>>>>
>>>>>> Conteúdo confidencial. Caso você não seja o real destinatário deste
>>>>>> e-mail por favor notifique o remetente e elimine esta mensagem. Privileged
>>>>>> and confidential. If you are not the intended addressee of this e-mail
>>>>>> please notify the sender and promptly delete this message.
>>>>>>
>>>>>>
>>>>> Hi Marcelo Kruger,
>>>>>
>>>>> From server configuration details provided, looks like a heavy trans
>>>>> DB & more logging enabled which might be resulting more write act on Master
>>>>> and Slave.
>>>>> At our first glance, we see *log_min_duration_statement* is set to 0
>>>>> which might be writing all the queries into the log resulting heavy writing
>>>>> activity that maybe cause for replication lag.
>>>>>
>>>>>
>>>>> Hope this helps you.
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>> Marcelo Krüger
>>>> <https://br.linkedin.com/in/marcelo-kr%C3%BCger-40aab127>
>>>>
>>>> Data Integration
>>>>
>>>> Office: +55 48 3333-2030 <+55%2048%203333-2030> | Mobile: +55 48
>>>> 999679585 <+55%2048%2099967-9585>
>>>>
>>>> R. Patrício Farias, 131 - Itacorubi - Florianópolis - 3º andar |
>>>> 88034-132 | Santa Catarina, Brasil
>>>> <https://maps.google.com/?q=R.+Patr%C3%ADcio+Farias,+131+-+Itacorubi+-+Florian%C3%B3polis+-+3%C2%BA+andar+%7C+%E2%80%A888034-132+%7C+Santa+Catarina,+Brasil&entry=gmail&source=g>
>>>>
>>>> São Paulo | Florianópolis | New York
>>>>
>>>> <https://www.linkedin.com/company/1235825/>
>>>> <https://www.facebook.com/neoway.nw/>
>>>>
>>>> Conteúdo confidencial. Caso você não seja o real destinatário deste
>>>> e-mail por favor notifique o remetente e elimine esta mensagem. Privileged
>>>> and confidential. If you are not the intended addressee of this e-mail
>>>> please notify the sender and promptly delete this message.
>>>>
>>>>
>>>
>>>
>>> --
>>> <https://shreeyansh.com/>
>>>
>>
>>
>>
>> --
>>
>>
>> Marcelo Krüger <https://br.linkedin.com/in/marcelo-kr%C3%BCger-40aab127>
>>
>> Data Integration
>>
>> Office: +55 48 3333-2030 <+55%2048%203333-2030> | Mobile: +55 48
>> 999679585 <+55%2048%2099967-9585>
>>
>> R. Patrício Farias, 131 - Itacorubi - Florianópolis
>> <https://maps.google.com/?q=R.+Patr%C3%ADcio+Farias,+131+-+Itacorubi+-+Florian%C3%B3polis&entry=gmail&source=g>
>> - 3º andar | 88034-132 | Santa Catarina, Brasil
>>
>> São Paulo | Florianópolis | New York
>>
>> <https://www.linkedin.com/company/1235825/>
>> <https://www.facebook.com/neoway.nw/>
>>
>> Conteúdo confidencial. Caso você não seja o real destinatário deste
>> e-mail por favor notifique o remetente e elimine esta mensagem. Privileged
>> and confidential. If you are not the intended addressee of this e-mail
>> please notify the sender and promptly delete this message.
>>
>>
>
> Further to analysis, we fear it may not be a just parameter change.
>
> we might have to make few strategies around db-archiving so that multiple
> table population in short time should happen during the off peak hours to
> avoid excessive generation of archives during peak hours.
>
> However High volume database like this needs careful planning and deep
> investigation in order to reduce application archive generation
>
> --
>
> [image: http://www.shreeyansh.com] <http://www.shreeyansh.com>
>

--

Marcelo Krüger <https://br.linkedin.com/in/marcelo-kr%C3%BCger-40aab127>

Data Integration

Office: +55 48 3333-2030 | Mobile: +55 48 999679585

R. Patrício Farias, 131 - Itacorubi - Florianópolis - 3º andar | 88034-132
| Santa Catarina, Brasil

São Paulo | Florianópolis | New York

<https://www.linkedin.com/company/1235825/>
<https://www.facebook.com/neoway.nw/>

Conteúdo confidencial. Caso você não seja o real destinatário deste e-mail
por favor notifique o remetente e elimine esta mensagem. Privileged and
confidential. If you are not the intended addressee of this e-mail please
notify the sender and promptly delete this message.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message preeti soni 2017-12-12 06:33:32 Missing files in pg_xlog, pg_subtrans and pg_clog directories during file system based backup
Previous Message Jorge Torralba 2017-12-08 22:52:08 Re: Option to make PostgreSQL quiescent on startup?