From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | tgl(at)sss(dot)pgh(dot)pa(dot)us |
Cc: | robertmhaas(at)gmail(dot)com, dilipbalaut(at)gmail(dot)com, hlinnaka(at)iki(dot)fi, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Race condition in recovery? |
Date: | 2021-06-11 05:07:45 |
Message-ID: | 20210611.140745.1586374762292931816.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
At Thu, 10 Jun 2021 21:53:18 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
> > At Thu, 10 Jun 2021 09:56:51 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in
> >> Thanks for the analysis and the patches. I have committed them.
>
> > Thanks for committing it.
>
> Please note that conchuela and jacana are still failing ...
>
> conchuela's failure is evidently not every time, but this test
> definitely postdates the "fix":
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2021-06-10%2014%3A09%3A08
A different test is failing there. Maybe from different issue.
> ==~_~===-=-===~_~== pgsql.build/src/test/recovery/tmp_check/log/regress_log_002_archiving ==~_~===-=-===~_~==
...
> # Postmaster PID for node "standby2" is 342349
> ### Promoting node "standby2"
> # Running: pg_ctl -D /home/pgbf/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/t_002_archiving_standby2_data/pgdata -l /home/pgbf/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/log/002_archiving_standby2.log promote
> waiting for server to promote................................................................................................................ stopped waiting
> pg_ctl: server did not promote in time
> Bail out! system pg_ctl failed
> ==~_~===-=-===~_~== pgsql.build/src/test/recovery/tmp_check/log/002_archiving_standby2.log ==~_~===-=-===~_~==
...
> 2021-06-10 16:21:21.870 CEST [342350:9] LOG: received promote request
> 2021-06-10 16:21:21.870 CEST [342350:10] LOG: redo done at 0/3030200 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.07 s
> 2021-06-10 16:21:21.870 CEST [342350:11] LOG: last completed transaction was at log time 2021-06-10 16:21:21.010599+02
> 2021-06-10 16:21:21.893 CEST [342350:12] LOG: restored log file "000000010000000000000003" from archive
> cp: /home/pgbf/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/t_002_archiving_primary_data/archives/00000003.history: No such file or directory
> 2021-06-10 16:21:21.896 CEST [342350:13] LOG: selected new timeline ID: 3
(log ends here)
> ==~_~===-=-===~_~== pgsql.build/src/test/recovery/tmp_check/log/002_archiving_primary.log ==~_~===-=-===~_~==
...
> 2021-06-10 16:21:21.107 CEST [342322:4] 002_archiving.pl LOG: disconnection: session time: 0:00:00.022 user=pgbf database=postgres host=[local]
> 2021-06-10 16:23:21.965 CEST [342279:4] LOG: received immediate shutdown request
So the standby2 was stuck after selecting the new timeline and before
updating control file and its postmaster couldn't even respond to
SIGQUIT.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2021-06-11 05:12:36 | Re: locking [user] catalog tables vs 2pc vs logical rep |
Previous Message | Noah Misch | 2021-06-11 05:00:20 | Re: libpq debug log |