From: | Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com> |
---|---|
To: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: crashes due to setting max_parallel_workers=0 |
Date: | 2017-03-28 09:07:39 |
Message-ID: | CAGPqQf0CcH+cyU=WK+bUvwgFEVFdGPFaBC2RMRz_hPCEKEFhwg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 28, 2017 at 10:00 AM, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com
> wrote:
>
>
> On 03/27/2017 01:40 PM, Rushabh Lathia wrote:
>
>>
>> ...
>> I was doing more testing with the patch and I found one more server
>> crash with the patch around same area, when we forced the gather
>> merge for the scan having zero rows.
>>
>> create table dept ( deptno numeric, dname varchar(20);
>> set parallel_tuple_cost =0;
>> set parallel_setup_cost =0;
>> set min_parallel_table_scan_size =0;
>> set min_parallel_index_scan_size =0;
>> set force_parallel_mode=regress;
>> explain analyze select * from dept order by deptno;
>>
>> This is because for leader we don't initialize the slot into gm_slots. So
>> in case where launched worker is zero and table having zero rows, we
>> end up having NULL slot into gm_slots array.
>>
>> Currently gather_merge_clear_slots() clear out the tuple table slots for
>> each
>> gather merge input and returns clear slot. In the patch I modified
>> function
>> gather_merge_clear_slots() to just clear out the tuple table slots and
>> always return NULL when All the queues and heap us exhausted.
>>
>>
> Isn't that just another sign the code might be a bit too confusing? I see
> two main issues in the code:
>
> 1) allocating 'slots' as 'nreaders+1' elements, which seems like a good
> way to cause off-by-one errors
>
> 2) mixing objects with different life spans (slots for readers vs. slot
> for the leader) seems like a bad idea too
>
> I wonder how much we gain by reusing the slot from the leader (I'd be
> surprised if it was at all measurable). David posted a patch reworking
> this, and significantly simplifying the GatherMerge node. Why not to accept
> that?
>
>
>
I think we all agree that we should get rid of nreaders from the
GatherMergeState
and need to do some code re-factor. But if I understood correctly that
Robert's
concern was to do that re-factor as separate commit.
I picked David's patch and started reviewing the changes. I applied that
patch
on top of my v2 patch (which does the re-factor of
gather_merge_clear_slots).
In David's patch, into gather_merge_init(), a loop where tuple array is
getting
allocate, that loop need to only up to nworkers_launched. Because we don't
hold the tuple array for leader. I changed that and did some other simple
changes based on mine v2 patch. I also performed manual testing with
the changes.
Please find attached re-factor patch, which is v2 patch submitted for the
server crash fix. (Attaching both the patch here again, for easy of access).
Thanks,
--
Rushabh Lathia
www.EnterpriseDB.com
Attachment | Content-Type | Size |
---|---|---|
gather_merge_refactor.patch | text/x-patch | 6.7 KB |
gm_nworkers_launched_zero_v2.patch | text/x-patch | 1.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2017-03-28 09:14:01 | xmltable doc fix and example for XMLNAMESPACES |
Previous Message | Mithun Cy | 2017-03-28 09:00:55 | Re: [POC] A better way to expand hash indexes. |