Quick Links

Resetting synchronous_standby_names can wait for CHECKPOINT to finish

From:	"Yusuke Egashira (Fujitsu)" <egashira(dot)yusuke(at)fujitsu(dot)com>
To:	"'pgsql-hackers(at)postgresql(dot)org'" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Resetting synchronous_standby_names can wait for CHECKPOINT to finish
Date:	2024-04-15 01:52:33
Message-ID:	TY3PR01MB996612E799EACC4FE1C9C4DCFF092@TY3PR01MB9966.jpnprd01.prod.outlook.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello, hackers.

When the checkpointer process is busy, even if we reset synchronous_standby_names, the resumption of the backend processes waiting in SyncRep are made to wait until the checkpoint is completed.
This prevents the prompt resumption of application processing when a problem occurs on the standby server in a synchronous replication system.
I confirmed this in PostgreSQL 12.18.

This issue has actually become a major problem for our customer.
When a problem occurred in the replication network, even after resetting synchronous_standby_names, the backend processes did not respond, resulting in timeout errors in many client applications.
The customer has also set the checkpoint_completion_target parameter to 0.9, and it seems to have been working fine under normal conditions.
However, there was a time when VACUUM was concentrated on a huge table. At that time, more than five times the max_wal_size of WAL output occurred during checkpoint processing.
Unfortunately, communication with the synchronous standby was lost during that checkpoint processing, and despite resetting the synchronous_standby_names, multiple client applications could not return a response while waiting for SyncRep.

I wrote a script(reset-synchronous_standby_names-during-checkpoint.sh) to illustrate the issue.
The script stops the synchronous standby during a transaction, and then resets synchronous_standby_names during checkpoint.
When I run this on my 1-core RHEL7 machine, I see that COMMIT does wait until the CHECKPOINT finishes, even though synchronous_standby_names has been reset.

I am attaching a patch (REL_12_STABLE) for the simplest seeming solution.
This moves the handling of SIGHUP reception by the checkpointer outside of the sleep process.
However, I am concerned that this change could affect the performance of checkpoint execution when there is a delay in the checkpoint schedule.
Can PostgreSQL tolerate this overhead?

Regards,
Yusuke Egashira.

Attachment	Content-Type	Size
reset-synchronous_standby_names-during-checkpoint.sh	application/octet-stream	1.4 KB
v1-reset-synchronous_standby_names-timing.patch	application/octet-stream	1022 bytes

Responses

RE: Resetting synchronous_standby_names can wait for CHECKPOINT to finish at 2024-05-14 00:12:42 from Yusuke Egashira (Fujitsu)

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	David G. Johnston	2024-04-15 01:58:35	Re: Stability of queryid in minor versions
Previous Message	Michael Paquier	2024-04-15 01:46:00	Re: ALTER TABLE SET ACCESS METHOD on partitioned tables