Quick Links

Re: crashes due to setting max_parallel_workers=0

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Cc:	Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: crashes due to setting max_parallel_workers=0
Date:	2017-03-27 16:22:29
Message-ID:	CA+Tgmob8DzATg0u31AHZuYQ3TXAt_EmUqHvDwaSRAJt_8ECCoQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> On 28 March 2017 at 04:57, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
>> <rushabh(dot)lathia(at)gmail(dot)com> wrote:
>>> About the original issue reported by Tomas, I did more debugging and
>>> found that - problem was gather_merge_clear_slots() was not returning
>>> the clear slot when nreader is zero (means nworkers_launched = 0).
>>> Due to the same scan was continue even all the tuple are exhausted,
>>> and then end up with server crash at gather_merge_getnext(). In the patch
>>> I also added the Assert into gather_merge_getnext(), about the index
>>> should be less then the nreaders + 1 (leader).
>>
>> Well, you and David Rowley seem to disagree on what the fix is here.
>> His patches posted upthread do A, and yours do B, and from a quick
>> look those things are not just different ways of spelling the same
>> underlying fix, but actually directly conflicting ideas about what the
>> fix should be. Any chance you can review his patches, and maybe he
>> can review yours, and we could try to agree on a consensus position?
>> :-)
>
> When comparing Rushabh's to my minimal patch, both change
> gather_merge_clear_slots() to clear the leader's slot. My fix
> mistakenly changes things so it does ExecInitExtraTupleSlot() on the
> leader's slot, but seems that's not required since
> gather_merge_readnext() sets the leader's slot to the output of
> ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
> mistake. Rushabh's patch sidesteps this by adding a conditional
> pfree() to not free slot that we didn't allocate in the first place.
>
> I do think the code could be improved a bit. I don't like the way
> GatherMergeState's nreaders and nworkers_launched are always the same.
> I think this all threw me off a bit and may have been the reason for
> the bug in the first place.

Yeah, if those fields are always the same, then I think that they
should be merged. That seems undeniably a good idea. Possibly we
should fix the crash bug first, though, and then do that afterwards.
What bugs me a little about Rushabh's fix is that it looks like magic.
You have to know that we're looping over two things and freeing them
up, but there's one more of one thing than the other thing. I think
that at least needs some comments or something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Re: crashes due to setting max_parallel_workers=0 at 2017-03-27 16:11:14 from David Rowley

Responses

Re: crashes due to setting max_parallel_workers=0 at 2017-03-27 16:36:57 from Rushabh Lathia

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Teodor Sigaev	2017-03-27 16:25:49	Re: Potential data loss of 2PC files
Previous Message	Robert Haas	2017-03-27 16:19:05	Re: [COMMITTERS] pgsql: Improve performance of find_all_inheritors()