From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> |
Cc: | Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: crashes due to setting max_parallel_workers=0 |
Date: | 2017-03-27 16:22:29 |
Message-ID: | CA+Tgmob8DzATg0u31AHZuYQ3TXAt_EmUqHvDwaSRAJt_8ECCoQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Mar 27, 2017 at 12:11 PM, David Rowley
<david(dot)rowley(at)2ndquadrant(dot)com> wrote:
> On 28 March 2017 at 04:57, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Sat, Mar 25, 2017 at 12:18 PM, Rushabh Lathia
>> <rushabh(dot)lathia(at)gmail(dot)com> wrote:
>>> About the original issue reported by Tomas, I did more debugging and
>>> found that - problem was gather_merge_clear_slots() was not returning
>>> the clear slot when nreader is zero (means nworkers_launched = 0).
>>> Due to the same scan was continue even all the tuple are exhausted,
>>> and then end up with server crash at gather_merge_getnext(). In the patch
>>> I also added the Assert into gather_merge_getnext(), about the index
>>> should be less then the nreaders + 1 (leader).
>>
>> Well, you and David Rowley seem to disagree on what the fix is here.
>> His patches posted upthread do A, and yours do B, and from a quick
>> look those things are not just different ways of spelling the same
>> underlying fix, but actually directly conflicting ideas about what the
>> fix should be. Any chance you can review his patches, and maybe he
>> can review yours, and we could try to agree on a consensus position?
>> :-)
>
> When comparing Rushabh's to my minimal patch, both change
> gather_merge_clear_slots() to clear the leader's slot. My fix
> mistakenly changes things so it does ExecInitExtraTupleSlot() on the
> leader's slot, but seems that's not required since
> gather_merge_readnext() sets the leader's slot to the output of
> ExecProcNode(outerPlan). I'd ignore my minimal fix because of that
> mistake. Rushabh's patch sidesteps this by adding a conditional
> pfree() to not free slot that we didn't allocate in the first place.
>
> I do think the code could be improved a bit. I don't like the way
> GatherMergeState's nreaders and nworkers_launched are always the same.
> I think this all threw me off a bit and may have been the reason for
> the bug in the first place.
Yeah, if those fields are always the same, then I think that they
should be merged. That seems undeniably a good idea. Possibly we
should fix the crash bug first, though, and then do that afterwards.
What bugs me a little about Rushabh's fix is that it looks like magic.
You have to know that we're looping over two things and freeing them
up, but there's one more of one thing than the other thing. I think
that at least needs some comments or something.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Teodor Sigaev | 2017-03-27 16:25:49 | Re: Potential data loss of 2PC files |
Previous Message | Robert Haas | 2017-03-27 16:19:05 | Re: [COMMITTERS] pgsql: Improve performance of find_all_inheritors() |