From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | tgl(at)sss(dot)pgh(dot)pa(dot)us |
Cc: | robertmhaas(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Race condition in recovery? |
Date: | 2021-06-11 05:26:44 |
Message-ID: | 20210611.142644.1872001951622668861.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Fri, 11 Jun 2021 14:07:45 +0900 (JST), Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote in
> At Thu, 10 Jun 2021 21:53:18 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
> > conchuela's failure is evidently not every time, but this test
> > definitely postdates the "fix":
conchuela failed recovery_check this time, and
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-06-10%2014%3A09%3A08
> So the standby2 was stuck after selecting the new timeline and before
> updating control file and its postmaster couldn't even respond to
> SIGQUIT.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-06-09%2021%3A12%3A25
This is before the "fix"
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-06-08%2014%3A07%3A46
failed in pg_verifybackupCheck
> ==~_~===-=-===~_~== pgsql.build/src/bin/pg_verifybackup/tmp_check/log/regress_log_003_corruption ==~_~===-=-===~_~==
...
> # Failed test 'base backup ok'
> # at t/003_corruption.pl line 115.
> # Running: pg_verifybackup /home/pgbf/buildroot/HEAD/pgsql.build/src/bin/pg_verifybackup/tmp_check/t_003_corruption_primary_data/backup/open_directory_fails
> pg_verifybackup: fatal: could not open file "/home/pgbf/buildroot/HEAD/pgsql.build/src/bin/pg_verifybackup/tmp_check/t_003_corruption_primary_data/backup/open_directory_fails/backup_manifest": No such file or directory
> not ok 38 - intact backup verified
The manifest file is missing in backup. In this case also the servers
failed to handle SIGQUIT.
> ==~_~===-=-===~_~== pgsql.build/src/bin/pg_verifybackup/tmp_check/log/003_corruption_primary.log ==~_~===-=-===~_~==
...
> 2021-06-08 16:17:41.706 CEST [51792:9] 003_corruption.pl LOG: received replication command: START_REPLICATION SLOT "pg_basebackup_51792" 0/B000000 TIMELINE 1
> 2021-06-08 16:17:41.706 CEST [51792:10] 003_corruption.pl STATEMENT: START_REPLICATION SLOT "pg_basebackup_51792" 0/B000000 TIMELINE 1
(log ends here)
There seems like some hardware failure?
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2021-06-11 05:29:53 | Re: logical replication of truncate command with trigger causes Assert |
Previous Message | Yura Sokolov | 2021-06-11 05:21:01 | Re: Add PortalDrop in exec_execute_message |