From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | tgl(at)sss(dot)pgh(dot)pa(dot)us |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Parallel pg_dump's error reporting doesn't work worth squat |
Date: | 2016-05-25 04:01:37 |
Message-ID: | 20160525.130137.175160263.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
# Note for convenience for others: The commit fixing the original
# issue is 1aa41e3eae3746e05d0e23286ac740a9a6cee7df.
At Mon, 23 May 2016 13:40:30 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in <1336(dot)1464025230(at)sss(dot)pgh(dot)pa(dot)us>
> I wrote:
> >> ... the pg_dump process has crashed with a SIGPIPE without printing
> >> any message whatsoever, and without coming anywhere near finishing the
> >> dump.
>
> > Attached is a proposed patch for this. It reverts exit_horribly() to
> > what it used to be pre-9.3, ie just print the message on stderr and die.
> > The master now notices child failure by observing EOF on the status pipe.
> > While that works automatically on Unix, we have to explicitly close the
> > child side of the pipe on Windows (could someone check that that works?).
> > I also fixed the parent side to ignore SIGPIPE earlier, so that it won't
> > just die if it was in the midst of sending to the child.
>
> Now that we're all back from PGCon ... does anyone wish to review this?
> Or have an objection to treating it as a bug fix and patching all branches
> back to 9.3?
FWIW, it seems to me a bug which should be fixed to back
branches.
I tried this only on Linux (I'll try it on Windows afterward) and
got something like this.
pg_dump: [archiver (db)] pg_dump: [archiver (db)] connection to database "postgres" failed: fe_sendauth: no password supplied
connection to database "postgres" failed: fe_sendauth: no password supplied
pg_dump: [archiver (db)] connection to database "postgres" failed: fe_sendauth: no password supplied
<some repeats>
pg_dump: [parallel archiver] a worker process died unexpectedly
pg_dump: [archiver (db)] connection to database "postgres" failed: fe_sendauth: no password supplied
<some repeats>
Although the error messages from multiple children are cluttered,
it would be inevitable and better than vanishing.
It seems hard to distinguish the meaning among the module names
enclosed by '[]' but it is another issue.
In archive_close_connection, the following change means that
si->AHX could be NULL there, as the existing code for
non-parallel does. But it seems to be set once for
si(=shutdown_info)'s lifetime, before reaching there, to a valid
value.
- DisconnectDatabase(si->AHX);
+ if (si->AHX)
+ DisconnectDatabase(si->AHX);
Shouldn't archive_close_connection have an assertion (si->AHX !=
NULL) instead of checking "if (si->AHX)" in each path?
> > BTW, after having spent the afternoon staring at it, I will assert with
> > confidence that pg_dump/parallel.c is an embarrassment to the project.
>
> I've done a bit of work on a cosmetic cleanup patch, but don't have
> anything to show yet.
>
> regards, tom lane
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2016-05-25 04:15:57 | Re: Parallel pg_dump's error reporting doesn't work worth squat |
Previous Message | Stephen Frost | 2016-05-25 03:33:08 | Re: Error during restore - dump taken with pg_dumpall -c option |