| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> | 
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org | 
| Subject: | Inadequate failure reporting from poll_query_until | 
| Date: | 2018-10-15 20:19:16 | 
| Message-ID: | 17913.1539634756@sss.pgh.pa.us | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
I just spent some time puzzling over why we've seen so many recoveryCheck
failures in the buildfarm recently --- excluding failures with known
causes, such as the pg_verify_checksums business, that's basically all
we've seen for the last week.  I am suspicious that there's some
timing-dependent problems in WAL streaming startup and shutdown, but
can't prove anything yet.  However, one case catches my eye:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper&dt=2018-10-15%2016%3A01%3A59
This is the sum total of the visible logged information:
    #
    not ok 2 - asterisk in synchronous_standby_names
    
    #   Failed test 'asterisk in synchronous_standby_names'
    #   at t/007_sync_rep.pl line 26.
The failing test is evidently this one:
    test_sync_state(
            $node_master, qq(standby1|1|sync
    standby2|1|potential
    standby3|1|potential),
            'asterisk in synchronous_standby_names',
            '*');
and when you drill down, you find that PostgresNode.pm's poll_query_until
must have timed out, and that blank line there is as much as it felt like
telling us about it.  Evidently what we were getting back from the server
did not match the expected output, but good luck guessing what it was.
I think we ought to fix poll_query_until so that upon failure, you get
a much more complete and intelligible report, along the lines of
poll_query_until timed out executing this query:
$query
expecting this output:
$expected
last actual query output:
$stdout
with stderr:
$stderr
Objections? Thoughts about better formatting?
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrew Dunstan | 2018-10-15 20:23:29 | Re: Inadequate failure reporting from poll_query_until | 
| Previous Message | Andres Freund | 2018-10-15 20:17:56 | Re: Large writable variables |