| From: | "higuchi(dot)daisuke(at)fujitsu(dot)com" <higuchi(dot)daisuke(at)fujitsu(dot)com> |
|---|---|
| To: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | [Bug fix]There is the case archive_timeout parameter is ignored after recovery works. |
| Date: | 2020-06-29 04:35:11 |
| Message-ID: | OSBPR01MB1751EABF275BAE92EAA854FBEC6E0@OSBPR01MB1751.jpnprd01.prod.outlook.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi,
I found the bug about archive_timeout parameter.
There is the case archive_timeout parameter is ignored after recovery works.
[Problem]
When the value of archive_timeout is smaller than that of checkpoint_timeout and recovery works, archive_timeout is ignored in the first WAL archiving.
Once WAL is archived, the archive_timeout seems to be valid after that.
I attached the simple script for reproducing this problem on version 12.
I also confirmed that PostgreSQL10, 11 and 12. I think other supported versions have this problem.
[Investigation]
In the CheckpointerMain(), calculate the time (cur_timeout) to wait on WaitLatch.
-----------------------------------------------------------------
now = (pg_time_t) time(NULL);
elapsed_secs = now - last_checkpoint_time;
if (elapsed_secs >= CheckPointTimeout)
continue; /* no sleep for us ... */
cur_timeout = CheckPointTimeout - elapsed_secs;
if (XLogArchiveTimeout > 0 && !RecoveryInProgress())
{
elapsed_secs = now - last_xlog_switch_time;
if (elapsed_secs >= XLogArchiveTimeout)
continue; /* no sleep for us ... */
cur_timeout = Min(cur_timeout, XLogArchiveTimeout - elapsed_secs);
}
(void) WaitLatch(MyLatch,
WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
cur_timeout * 1000L /* convert to ms */ ,
WAIT_EVENT_CHECKPOINTER_MAIN);
-----------------------------------------------------------------
Currently, cur_timeout is set according to only checkpoint_timeout when it is during recovery.
Even during recovery, the cur_timeout should be calculated including archive_timeout as well as checkpoint_timeout, I think.
I attached the patch to solve this problem.
Regards,
Daisuke, Higuchi
| Attachment | Content-Type | Size |
|---|---|---|
| archive_timeout_test.sh | application/octet-stream | 994 bytes |
| archive_timeout.patch | application/octet-stream | 760 bytes |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Masahiko Sawada | 2020-06-29 04:55:56 | Re: Resetting spilled txn statistics in pg_stat_replication |
| Previous Message | Amit Kapila | 2020-06-29 03:34:25 | Re: pgsql: Enable Unix-domain sockets support on Windows |