From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Subject: | Re: Parallel worker hangs while handling errors. |
Date: | 2020-07-23 07:24:05 |
Message-ID: | CALDaNm2g5GgBpzGgdwhFgxD=ur42O+Fh+prp-xiyHp+feB+q=w@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for reviewing and adding your thoughts, My comments are inline.
On Fri, Jul 17, 2020 at 1:21 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> The same hang issue can occur(though I'm not able to back it up with a
> use case), in the cases from wherever the EmitErrorReport() is called
> from "if (sigsetjmp(local_sigjmp_buf, 1) != 0)" block, such as
> autovacuum.c, bgwriter.c, bgworker.c, checkpointer.c, walwriter.c, and
> postgres.c.
>
I'm not sure if this can occur in other cases.
> >
> > One of the fixes could be to call BackgroundWorkerUnblockSignals just
> > after sigsetjmp. I'm not sure if this is the best solution.
> > Robert & myself had a discussion about the problem yesterday. We felt
> > this is a genuine problem with the parallel worker error handling and
> > need to be fixed.
> >
>
> Note that, in all sigsetjmp blocks, we intentionally
> HOLD_INTERRUPTS(), to not cause any issues while performing error
> handling, I'm concerned here that now, if we directly call
> BackgroundWorkerUnblockSignals() which will open up all the signals
> and our main intention of holding interrupts or block signals may go
> away.
>
> Since the main problem for this hang issue is because of blocking
> SIGUSR1, in sigsetjmp, can't we just only unblock only the SIGUSR1,
> instead of unblocking all signals? I tried this with parallel copy
> hang, the issue is resolved.
>
On putting further thoughts on this, I feel just unblocking SIGUSR1
would be the right approach in this case. I'm attaching a new patch
which unblocks SIGUSR1 signal. I have verified that the original issue
with WIP parallel copy patch gets fixed. I have made changes only in
bgworker.c as we require the parallel worker to receive this signal
and continue processing. I have not included the changes for other
processes as I'm not sure if this scenario is applicable for other
processes.
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
0001-Fix-for-Parallel-worker-hangs-while-handling-errors.patch | text/x-patch | 1.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Ahsan Hadi | 2020-07-23 07:34:23 | Re: Transactions involving multiple postgres foreign servers, take 2 |
Previous Message | Pavel Stehule | 2020-07-23 06:54:25 | Re: [Proposal] Global temporary tables |