From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Race condition in server-crash testing |
Date: | 2022-04-04 04:50:27 |
Message-ID: | 1801850.1649047827@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
My pet dinosaur gaur just failed [1] in
src/test/recovery/t/022_crash_temp_files.pl, which does this:
-----
my $ret = PostgreSQL::Test::Utils::system_log('pg_ctl', 'kill', 'KILL', $pid);
is($ret, 0, 'killed process with KILL');
# Close psql session
$killme->finish;
$killme2->finish;
# Wait till server restarts
$node->poll_query_until('postgres', undef, '');
-----
It's hard to be totally sure, but I think what happened is that
gaur hit the in-hindsight-obvious race condition in this code:
we managed to execute a successful iteration of poll_query_until
before the postmaster had noticed its dead child and commenced
the restart. The test lines after these are not prepared to see
failure-to-connect.
It's not obvious to me how to remove this race condition.
Thoughts?
regards, tom lane
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=gaur&dt=2022-04-03%2021%3A14%3A41
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-04-04 05:07:21 | Re: Race condition in server-crash testing |
Previous Message | Andres Freund | 2022-04-04 04:33:37 | Re: Extensible Rmgr for Table AMs |