WAL replay stuck in hot standby when performing the backup

From: Andrea Moretto <moretto(dot)andrea(at)gmail(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: WAL replay stuck in hot standby when performing the backup
Date: 2019-07-16 13:14:56
Message-ID: 92CEF44C-A3DD-433D-8CD7-B96EA01A2AA5@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi there,

I have noticed that WAL replay blocks in a hot standby slave when performing the backup.
I am using on both PostgreSQL 9.4.23 on Redhat.

During the whole backup procedure (which takes more than 24 hours), the data onto the hot standby node is not updated.

Is this behaviour correct?

My idea is to confine batch workloads, backups and less critical OLTP queries on the hot standby.
If during the backup live data is not updated in the hot standby node, I have to re-plan the environment.

Details:

- Replication statistics show that replay location is lagging far behind the flush location (on master side):

select * from pg_stat_replication;

pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
-------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
15477 | 10 | postgres | walreceiver | 127.0.0.1 | | 50612 | 2019-07-15 17:34:36.759306+00 | 1152223785 | streaming | 274D/804CBCB8 | 274D/804CBCB8 | 274D/804CBCB8 | 2744/30F84028 | 0 | async
(1 row)

- Replay lag is about 19 hours and running (on hot standby side):

select now()-pg_last_xact_replay_timestamp() as replication_lag;

replication_lag
-----------------
19:12:11.2289
(1 row)

SELECT
pg_last_xlog_receive_location() receive,
pg_last_xlog_replay_location() replay,
(
extract(epoch FROM now()) -
extract(epoch FROM pg_last_xact_replay_timestamp())
)::int lag;

receive | replay | lag
---------------+---------------+-------
274E/BAF0B610 | 2744/30F84028 | 69157
(1 row)

The only query currently running (not in idle state) on the slave is a COPY:

select * from pg_stat_activity;

datid | datname | pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | xact_start | query_start | state_change | waiting | state | backend_xid | backend_xmin | query
-------+-------------+-------+----------+----------------+------------------------+--------------+-----------------+-------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+---------+--------+-------------+--------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
16993 | mydb | 15604 | 3226933 | rousr | PostgreSQL JDBC Driver | 10.192.87.91 | | 41491 | 2019-07-15 17:35:57.01458+00 | | 2019-07-16 12:55:58.21023+00 | 2019-07-16 12:55:58.210251+00 | f | idle | | | COMMIT
16993 | mydb | 18868 | 3226603 | amoretto | psql93 | 10.174.62.66 | | 54106 | 2019-07-16 12:59:30.122865+00 | 2019-07-16 13:00:18.807889+00 | 2019-07-16 13:00:18.807889+00 | 2019-07-16 13:00:18.807893+00 | f | active | | 1152301732 | select * from pg_stat_activity;
16993 | mydb | 2559 | 3226603 | amoretto | psql93 | 10.174.62.66 | | 52338 | 2019-07-16 10:57:03.118782+00 | | 2019-07-16 10:58:27.278095+00 | 2019-07-16 10:58:27.279752+00 | f | idle | | | select * from pg_stat_replication;
16993 | mydb | 15736 | 3226603 | amoretto | pg_dump | 10.192.87.90 | | 4281 | 2019-07-15 17:36:42.540157+00 | 2019-07-15 17:36:42.566704+00 | 2019-07-16 12:51:04.488305+00 | 2019-07-16 12:51:04.488307+00 | f | active | | 1152223785 | COPY public.c_box_hw_dev_app_hist (box_id, hardware_id, device_id, application_id, sequence, history_time, history_info, usr_id, usr_info, archived, is_history_info_zip, description, parent_application_name, iid) TO stdout;
(4 rows)

select * from pg_stat_database_conflicts;
datid | datname | confl_tablespace | confl_lock | confl_snapshot | confl_bufferpin | confl_deadlock
-------+-------------+------------------+------------+----------------+-----------------+----------------
1 | template1 | 0 | 0 | 0 | 0 | 0
12998 | template0 | 0 | 0 | 0 | 0 | 0
13003 | postgres | 0 | 0 | 0 | 0 | 0
16993 | mydb | 0 | 0 | 0 | 0 | 0
(4 rows)

select * from pg_stat_database;
datid | datname | numbackends | xact_commit | xact_rollback | blks_read | blks_hit | tup_returned | tup_fetched | tup_inserted | tup_updated | tup_deleted | conflicts | temp_files | temp_bytes | deadlocks | blk_read_time | blk_write_time | stats_reset
-------+-------------+-------------+-------------+---------------+-----------+----------+--------------+-------------+--------------+-------------+-------------+-----------+------------+------------+-----------+---------------+----------------+-------------------------------
1 | template1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
12998 | template0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
13003 | postgres | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
16993 | mydb | 3 | 2710 | 2 | 2499 | 333071 | 213588 | 72658 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2019-07-15 17:31:51.600354+00
(4 rows)

select * from pg_stat_bgwriter;
checkpoints_timed | checkpoints_req | checkpoint_write_time | checkpoint_sync_time | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_backend_fsync | buffers_alloc | stats_reset
-------------------+-----------------+-----------------------+----------------------+--------------------+---------------+------------------+-----------------+-----------------------+---------------+-------------------------------
4636 | 1 | 1107064 | 9 | 378395 | 0 | 0 | 460498 | 0 | 1432887 | 2019-07-15 17:31:22.319099+00
(1 row)

I hope that someone can give me some hint on this topic, it would be appreciated!

Thanks in advance,

Andrea

Andrea Moretto
moretto(dot)andrea(at)gmail(dot)com
-------------------------------------------------------
CONFIDENTIALITY NOTICE
This message and its attachments are addressed solely to the persons
above and may contain confidential information. If you have received
the message in error, be informed that any use of the content hereof
is prohibited. Please return it immediately to the sender and delete
the message.
-------------------------------------------------------

Browse pgsql-admin by date

  From Date Subject
Next Message Ron 2019-07-16 14:10:39 Re: Unable to connect DB...?
Previous Message Venkataramana Aitla 2019-07-16 12:09:18 Re: Unable to connect DB...?