Quick Links

Re: High replication lag - Stream Replication

From:	Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>
To:	Marcelo Kruger <marcelo(dot)kruger(at)neoway(dot)com(dot)br>
Cc:	"pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject:	Re: High replication lag - Stream Replication
Date:	2017-11-13 12:59:23
Message-ID:	CAGDYbUPJ-hP6TMOLosK9=ZQC63q5YrJmBheb4L4PrL47Kx+zmg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-admin

On Thu, Nov 9, 2017 at 5:36 PM, Marcelo Kruger <marcelo(dot)kruger(at)neoway(dot)com(dot)br
> wrote:

> Hi Shreeyansh,
>
> I changed the log_min_duration_statement parameter to -1, but there was no
> improvement in the database. The application of the archives by Stream
> Replication is still slow.
>
> In my usage scenario we performed large table creation operations. We
> create several tables in a short time with more than 40GB. We currently
> have a 500GB growth per month in the database.
>
> Transactions in one day can generate more than 30000 archives (480GB).
>
> Are there any other parameters that can be changed to improve performance
> in the application of archives?
>
> 2017-11-09 2:43 GMT-02:00 Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>:
>
>> On Wed, Nov 8, 2017 at 9:34 PM, Marcelo Kruger <
>> marcelo(dot)kruger(at)neoway(dot)com(dot)br> wrote:
>>
>>> Hi Shreeyansh,
>>>
>>> Thank you for your help.
>>>
>>> The lag is not occurring in the recording of the archive in production,
>>> nor in sending to StandBy. The lag is occurring in the archive application
>>> in StandBy. And this slowness increases with running queries in the StandBy
>>> database.
>>>
>>> The process of recovering is consuming little CPU, and little I / O.
>>>
>>> postgres: startup process recovering 0000000100001B4300000006
>>>
>>> This parameter *log_min_duration_statement *should be changed on the
>>> StandBy server?
>>>
>>
>> The parameter should be changed On both master as well as slave as this
>> result towards heavy load on the server causing slower execution of the
>> queries due to higher resource consumption.
>>
>> This parameter normally set capturing bottleneck queries by setting the
>> higher values to minimize the write activities and load on the servers.
>>
>>
>>
>>
>>>
>>> 2017-11-08 12:09 GMT-02:00 Shreeyansh Dba <shreeyansh2014(at)gmail(dot)com>:
>>>
>>>>
>>>>
>>>> On Wed, Nov 8, 2017 at 7:08 PM, Marcelo Kruger <
>>>> marcelo(dot)kruger(at)neoway(dot)com(dot)br> wrote:
>>>>
>>>>> Good afternoon,
>>>>>
>>>>> I have two servers hosted on Azure with PostgreSQL 9.6 databases. One
>>>>> database is production, the other is standby.
>>>>> For replication I use stream replication between servers. However the
>>>>> application of the archives in the standby database is slow, always
>>>>> generating replication lag.
>>>>>
>>>>> The replication lag increases considerably when queries are performed
>>>>> on the StandBy database. I wonder if there is any setting that can be made
>>>>> to improve this performance.
>>>>>
>>>>> *Server Production*
>>>>> SO: CentOS Linux release 7.4.1708 (Core) 64Bits
>>>>> CPU: Intel(R) Xeon(R) CPU E5-2698B v3 @ 2.00GHz - 15 Cores
>>>>> Memory: 240GB
>>>>> I/O: 17TB RAID0 (17 Disks (1TB for disk) SSD with read and write cache
>>>>> enabled)
>>>>>
>>>>> *Server StandBy*
>>>>> SO: CentOS Linux release 7.4.1708 (Core) 64Bits
>>>>> CPU: Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz - 20 Cores
>>>>> Memory: 140GB
>>>>> I/O: 22TB RAID0 (22 Disks (1TB for disk) SSD with read and write cache
>>>>> enabled)
>>>>>
>>>>> *Configuration PostgreSQL Production*
>>>>>
>>>>> postgres.conf
>>>>>
>>>>> listen_addresses = '*'
>>>>> port = 5433
>>>>> max_connections = 2000
>>>>> superuser_reserved_connections = 3
>>>>> shared_buffers = 56GB
>>>>> work_mem = 29360kB
>>>>> maintenance_work_mem = 4GB
>>>>> autovacuum_work_mem = 8GB
>>>>> max_stack_depth = 5MB
>>>>> dynamic_shared_memory_type = posix
>>>>> bgwriter_delay = 10ms
>>>>> bgwriter_lru_maxpages = 700
>>>>> bgwriter_lru_multiplier = 2.0
>>>>> fsync = off
>>>>> synchronous_commit = on
>>>>> full_page_writes = off
>>>>> wal_buffers = 1500MB
>>>>> wal_writer_delay = 1ms
>>>>> checkpoint_timeout = 10min
>>>>> max_wal_size = 4GB
>>>>> min_wal_size = 2GB
>>>>> checkpoint_completion_target = 0.9
>>>>> effective_cache_size = 168GB
>>>>> default_statistics_target = 500
>>>>> log_destination = 'csvlog'
>>>>> logging_collector = on
>>>>> log_directory = 'pg_log'
>>>>> log_filename = 'postgresql-%w.log'
>>>>> log_file_mode = 0640
>>>>> log_truncate_on_rotation = on
>>>>> log_rotation_age = 1d
>>>>> log_rotation_size = 600MB
>>>>> log_min_duration_statement = 0
>>>>> log_checkpoints = off
>>>>> log_connections = off
>>>>> log_disconnections = off
>>>>> log_duration = off
>>>>> log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h'
>>>>> log_lock_waits = on
>>>>> log_timezone = 'Brazil/East'
>>>>> autovacuum = on
>>>>> log_autovacuum_min_duration = -1
>>>>> autovacuum_max_workers = 12
>>>>> autovacuum_naptime = 30s
>>>>> autovacuum_vacuum_threshold = 20
>>>>> autovacuum_analyze_threshold = 20
>>>>> autovacuum_vacuum_scale_factor = 0.2
>>>>> autovacuum_analyze_scale_factor = 0.1
>>>>> autovacuum_vacuum_cost_delay = 20
>>>>> autovacuum_vacuum_cost_limit = 800
>>>>> vacuum_cost_delay = 20
>>>>> vacuum_cost_limit = 800
>>>>> datestyle = 'iso, mdy'
>>>>> timezone = 'Brazil/East'
>>>>> lc_messages = 'en_US.UTF-8'
>>>>> lc_monetary = 'en_US.UTF-8'
>>>>> lc_numeric = 'en_US.UTF-8'
>>>>> lc_time = 'en_US.UTF-8'
>>>>> default_text_search_config = 'pg_catalog.english'
>>>>> max_locks_per_transaction = 256
>>>>> pgpool.pg_ctl = '/usr/pgsql-9.6/bin/pg_ctl'
>>>>> effective_io_concurrency = 10
>>>>> wal_level = hot_standby
>>>>> max_wal_senders = 10
>>>>> max_replication_slots = 3
>>>>> wal_keep_segments = 38400
>>>>>
>>>>> *Configuration PostgreSQL StandBy*
>>>>>
>>>>> postgres.conf
>>>>>
>>>>> listen_addresses = '*'
>>>>> port = 5433
>>>>> max_connections = 2000
>>>>> superuser_reserved_connections = 3
>>>>> shared_buffers = 10GB
>>>>> work_mem = 183500kB
>>>>> maintenance_work_mem = 4GB
>>>>> autovacuum_work_mem = 8GB
>>>>> max_stack_depth = 5MB
>>>>> dynamic_shared_memory_type = posix
>>>>> bgwriter_delay = 10ms
>>>>> bgwriter_lru_maxpages = 700
>>>>> bgwriter_lru_multiplier = 2.0
>>>>> fsync = off
>>>>> synchronous_commit = on
>>>>> full_page_writes = off
>>>>> wal_buffers = 16MB
>>>>> wal_writer_delay = 1ms
>>>>> checkpoint_timeout = 10min
>>>>> max_wal_size = 4GB
>>>>> min_wal_size = 2GB
>>>>> checkpoint_completion_target = 0.9
>>>>> effective_cache_size = 104GB
>>>>> default_statistics_target = 500
>>>>> log_destination = 'csvlog'
>>>>> logging_collector = on
>>>>> log_directory = 'pg_log'
>>>>> log_filename = 'postgresql-%w.log'
>>>>> log_file_mode = 0640
>>>>> log_truncate_on_rotation = on
>>>>> log_rotation_age = 1d
>>>>> log_rotation_size = 600MB
>>>>> log_min_duration_statement = 0
>>>>> log_checkpoints = off
>>>>> log_connections = off
>>>>> log_disconnections = off
>>>>> log_duration = off
>>>>> log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h'
>>>>> log_lock_waits = on
>>>>> log_timezone = 'Brazil/East'
>>>>> autovacuum = on
>>>>> log_autovacuum_min_duration = -1
>>>>> autovacuum_max_workers = 3
>>>>> autovacuum_naptime = 30s
>>>>> autovacuum_vacuum_threshold = 20
>>>>> autovacuum_analyze_threshold = 20
>>>>> autovacuum_vacuum_scale_factor = 0.2
>>>>> autovacuum_analyze_scale_factor = 0.1
>>>>> autovacuum_vacuum_cost_delay = 20
>>>>> autovacuum_vacuum_cost_limit = 800
>>>>> vacuum_cost_delay = 20
>>>>> vacuum_cost_limit = 800
>>>>> datestyle = 'iso, mdy'
>>>>> timezone = 'Brazil/East'
>>>>> lc_messages = 'en_US.UTF-8'
>>>>> lc_monetary = 'en_US.UTF-8'
>>>>> lc_numeric = 'en_US.UTF-8'
>>>>> lc_time = 'en_US.UTF-8'
>>>>> default_text_search_config = 'pg_catalog.english'
>>>>> max_locks_per_transaction = 256
>>>>> pgpool.pg_ctl = '/usr/pgsql-9.6/bin/pg_ctl'
>>>>> effective_io_concurrency = 10
>>>>> max_worker_processes = 18
>>>>> hot_standby = on
>>>>> hot_standby_feedback = on
>>>>>
>>>>> recovery.conf
>>>>>
>>>>> standby_mode = 'on'
>>>>> primary_slot_name = 'replicacao'
>>>>> primary_conninfo = 'host=bdreplica00 port=5433 user=replicator
>>>>> password='
>>>>> trigger_file = '/var/lib/pgsql/9.6/data/master_db.conf'
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> Marcelo Krüger
>>>>> <https://br.linkedin.com/in/marcelo-kr%C3%BCger-40aab127>
>>>>>
>>>>> Data Integration
>>>>>
>>>>> Office: +55 48 3333-2030 <+55%2048%203333-2030> | Mobile: +55 48
>>>>> 999679585 <+55%2048%2099967-9585>
>>>>>
>>>>> R. Patrício Farias, 131 - Itacorubi - Florianópolis - 3º andar |
>>>>> 88034-132 | Santa Catarina, Brasil
>>>>> <https://maps.google.com/?q=R.+Patr%C3%ADcio+Farias,+131+-+Itacorubi+-+Florian%C3%B3polis+-+3%C2%BA+andar+%7C+%E2%80%A888034-132+%7C+Santa+Catarina,+Brasil&entry=gmail&source=g>
>>>>>
>>>>> São Paulo | Florianópolis | New York
>>>>>
>>>>> <https://www.linkedin.com/company/1235825/>
>>>>> <https://www.facebook.com/neoway.nw/>
>>>>>
>>>>> Conteúdo confidencial. Caso você não seja o real destinatário deste
>>>>> e-mail por favor notifique o remetente e elimine esta mensagem. Privileged
>>>>> and confidential. If you are not the intended addressee of this e-mail
>>>>> please notify the sender and promptly delete this message.
>>>>>
>>>>>
>>>> Hi Marcelo Kruger,
>>>>
>>>> From server configuration details provided, looks like a heavy trans DB
>>>> & more logging enabled which might be resulting more write act on Master
>>>> and Slave.
>>>> At our first glance, we see *log_min_duration_statement* is set to 0
>>>> which might be writing all the queries into the log resulting heavy writing
>>>> activity that maybe cause for replication lag.
>>>>
>>>>
>>>> Hope this helps you.
>>>>
>>>>
>>>> --
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>>
>>> Marcelo Krüger <https://br.linkedin.com/in/marcelo-kr%C3%BCger-40aab127>
>>>
>>> Data Integration
>>>
>>> Office: +55 48 3333-2030 <+55%2048%203333-2030> | Mobile: +55 48
>>> 999679585 <+55%2048%2099967-9585>
>>>
>>> R. Patrício Farias, 131 - Itacorubi - Florianópolis - 3º andar |
>>> 88034-132 | Santa Catarina, Brasil
>>> <https://maps.google.com/?q=R.+Patr%C3%ADcio+Farias,+131+-+Itacorubi+-+Florian%C3%B3polis+-+3%C2%BA+andar+%7C+%E2%80%A888034-132+%7C+Santa+Catarina,+Brasil&entry=gmail&source=g>
>>>
>>> São Paulo | Florianópolis | New York
>>>
>>> <https://www.linkedin.com/company/1235825/>
>>> <https://www.facebook.com/neoway.nw/>
>>>
>>> Conteúdo confidencial. Caso você não seja o real destinatário deste
>>> e-mail por favor notifique o remetente e elimine esta mensagem. Privileged
>>> and confidential. If you are not the intended addressee of this e-mail
>>> please notify the sender and promptly delete this message.
>>>
>>>
>>
>>
>> --
>> <https://shreeyansh.com/>
>>
>
>
>
> --
>
>
> Marcelo Krüger <https://br.linkedin.com/in/marcelo-kr%C3%BCger-40aab127>
>
> Data Integration
>
> Office: +55 48 3333-2030 | Mobile: +55 48 999679585
>
> R. Patrício Farias, 131 - Itacorubi - Florianópolis
> <https://maps.google.com/?q=R.+Patr%C3%ADcio+Farias,+131+-+Itacorubi+-+Florian%C3%B3polis&entry=gmail&source=g>
> - 3º andar | 88034-132 | Santa Catarina, Brasil
>
> São Paulo | Florianópolis | New York
>
> <https://www.linkedin.com/company/1235825/>
> <https://www.facebook.com/neoway.nw/>
>
> Conteúdo confidencial. Caso você não seja o real destinatário deste e-mail
> por favor notifique o remetente e elimine esta mensagem. Privileged and
> confidential. If you are not the intended addressee of this e-mail please
> notify the sender and promptly delete this message.
>
>

Further to analysis, we fear it may not be a just parameter change.

we might have to make few strategies around db-archiving so that multiple
table population in short time should happen during the off peak hours to
avoid excessive generation of archives during peak hours.

However High volume database like this needs careful planning and deep
investigation in order to reduce application archive generation

[image: http://www.shreeyansh.com] <http://www.shreeyansh.com>

In response to

Re: High replication lag - Stream Replication at 2017-11-09 12:06:30 from Marcelo Kruger

Responses

Re: [ADMIN] High replication lag - Stream Replication at 2017-12-10 21:59:36 from Marcelo Kruger

Browse pgsql-admin by date

	From	Date	Subject
Next Message	Aleksander Kamenik	2017-11-13 18:41:53	yum repo URL and pgdg rpm point at testing
Previous Message	Shreeyansh Dba	2017-11-13 10:58:24	Re: FQDN for replication connection in pg_hba.conf