Re: SegFault on 9.6.14

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jerry Sievers <gsievers19(at)comcast(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SegFault on 9.6.14
Date: 2019-11-18 08:52:44
Message-ID: CAA4eK1KkhPSt8wxvoQgFRA1dePs8JvTHdTjq=vyZqEmsAn3c-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 18, 2019 at 10:08 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Oct 17, 2019 at 10:51 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> >
> > ===
> > Don't shut down Gather[Merge] early under Limit.
> >
> > Revert part of commit 19df1702f5.
> >
> > Early shutdown was added by that commit so that we could collect
> > statistics from workers, but unfortunately it interacted badly with
> > rescans. Rescanning a Limit over a Gather node could produce a SEGV
> > on 9.6 and resource leak warnings on later releases. By reverting the
> > early shutdown code, we might lose statistics in some cases of Limit
> > over Gather, but that will require further study to fix.
> >
>
> How about some text like below? I have added slightly different text
> to explain the reason for the problem.
>
> "Early shutdown was added by that commit so that we could collect
> statistics from workers, but unfortunately, it interacted badly with
> rescans. The problem is that we ended up destroying the parallel
> context which is required for rescans. This leads to rescans of a
> Limit node over a Gather node to produce unpredictable results as it
> tries to access destroyed parallel context. By reverting the early
> shutdown code, we might lose statistics in some cases of Limit over
> Gather, but that will require further study to fix."
>
> I am not sure but we can even add a comment in the place where we are
> removing some code (in nodeLimit.c) to indicate that 'Ideally we
> should shutdown parallel resources here to get the correct stats, but
> that would lead to rescans misbehaving when there is a Gather [Merge]
> node beneath it. (Explain the reason for misbehavior and the ideas we
> discussed in this thread to fix the same) ........."
>
> I can try to come up with comments in nodeLimit.c on the above lines
> if we think that is a good idea?
>

I have modified the commit message as proposed above and additionally
added comments in nodeLimit.c. I think we should move ahead with this
bug-fix patch. If we don't like the comment, it can anyway be
improved later.

Any suggestions?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment Content-Type Size
0001-Don-t-shut-down-Gather-Merge-early-under-Limit.patch application/octet-stream 5.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2019-11-18 08:53:11 Re: adding partitioned tables to publications
Previous Message Jinbao Chen 2019-11-18 06:48:17 Planner chose a much slower plan in hashjoin, using a large table as the inner table.