Re: SegFault on 9.6.14

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jerry Sievers <gsievers19(at)comcast(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: SegFault on 9.6.14
Date: 2019-07-27 05:58:05
Message-ID: CAA4eK1Ja-eoavXcr0eq7w7hP=64VP49k=NMFxwhtK28NHfBOdA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 27, 2019 at 8:29 AM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> On Fri, Jul 26, 2019 at 4:13 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > On Tue, Jul 23, 2019 at 5:28 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > Right, that will be lesser code churn and it can also work. However,
> > > one thing that needs some thought is till now es_top_eflags is only
> > > set in ExecutorStart and same is mentioned in comments where it is
> > > declared and it seems we are going to change that with this idea. How
> > > about having a separate function ExecBlahShutdown which will clean up
> > > resources as parallel context and can be called only from ExecutePlan
> > > where we are calling ExecShutdownNode? I think both these and the
> > > other solution we have discussed are on similar lines and another idea
> > > could be to relax the assert which again is not a superb idea.
> >
> > It seems we don't have a clear preference for any particular solution
> > among these and neither there appears to be any better idea. I guess
> > we can wait for a few days to see if Robert has any views on this,
> > otherwise, pick one of the above and move ahead.
>
> I take the EXEC_FLAG_DONE idea back. It's ugly and too hard to verify
> that every appropriate path sets it, and a flag that means the
> opposite would be even more of a kluge, and generally I think I was
> looking at this too myopically: I was looking for a way to shut down
> processes ASAP without giving up the shared memory we'll need for
> rescanning, but what I should have been looking at is the reason you
> did that in the first place: to get the instrumentation data. Can you
> explain why it's necessary to do that explicitly for Limit? Wouldn't
> the right place to collect instrumentation be at the end of execution
> when Shutdown will run in all cases anyway (and possibly also during
> ExecParallelReinitialize() or something like that if it's being
> clobbered by rescans, I didn't check)? What's special about Limit?
>

I think here you are missing the point that to collect the
instrumentation information one also need to use InstrStartNode and
InstrStopNode. So, for the Limit node, the InstrStopNode would be
already done by the time we call shutdown of workers at the end of
execution. To know a bit more details, see [1][2][3].

> Today while poking at this and trying to answer those questions for
> myself, I realised that the repro I posted earlier[1] crashes exactly
> as Jerry reported on REL9_6_STABLE, but in later release branches it
> runs to completion. That's because the crashing code was removed in
> commit 41b0dd98 "Separate reinitialization of shared parallel-scan
> state from ExecReScan.".
>
> So newer branches get past that problem, but they all spit out tons of
> each of these three warnings:
>
> WARNING: buffer refcount leak: [172] (rel=base/12558/16390,
> blockNum=5, flags=0x93800000, refcount=1 2998)
> ...
> WARNING: relcache reference leak: relation "join_bar" not closed
> ...
> WARNING: Snapshot reference leak: Snapshot 0x7ff20383bfb0 still referenced
> ...
>
> Oops.
>

This is exactly due to the same problem that before rescans, we have
destroyed the shared memory. If you do the earlier trick of not
cleaning up shared memory till ExecEndNode, then you won't see this
problem.

[1] - https://www.postgresql.org/message-id/CAA4eK1KZEbYKj9HHP-6WqqjAXuoB%2BWJu-w1s9uovj%3DeeBxC48Q%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CA%2BTgmoY3kcTcc5bFCZeY5NMFna-xaMPuTHA-z-z2Bmfg%2Bdb-XQ%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/CAA4eK1L0KAZWgnRJz%3DVNVpyS3FFbVh8E5egyziaR0E10bC204Q%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2019-07-27 06:22:22 Re: Initdb failure
Previous Message Pavel Stehule 2019-07-27 05:44:31 Re: proposal: type info support functions for functions that use "any" type