From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Luc Vlaming <luc(at)swarm64(dot)com> |
Cc: | "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com> |
Subject: | Re: Parallel Inserts in CREATE TABLE AS |
Date: | 2021-01-04 13:53:11 |
Message-ID: | CALj2ACW376px6jYXhmu4EPQNsOwf42J1S=9eKtoN-tCgOypVRg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jan 4, 2021 at 5:44 PM Luc Vlaming <luc(at)swarm64(dot)com> wrote:
> On 04-01-2021 12:16, Hou, Zhijie wrote:
> >> ================
> >> wrt v18-0002....patch:
> >>
> >> It looks like this introduces a state machine that goes like:
> >> - starts at CTAS_PARALLEL_INS_UNDEF
> >> - possibly moves to CTAS_PARALLEL_INS_SELECT
> >> - CTAS_PARALLEL_INS_TUP_COST_CAN_IGN can be added
> >> - if both were added at some stage, we can go to
> >> CTAS_PARALLEL_INS_TUP_COST_IGNORED and ignore the costs
> >>
> >> what i'm wondering is why you opted to put logic around
> >> generate_useful_gather_paths and in cost_gather when to me it seems more
> >> logical to put it in create_gather_path? i'm probably missing something
> >> there?
> >
> > IMO, The reason is we want to make sure we only ignore the cost when Gather is the top node.
> > And it seems the generate_useful_gather_paths called in apply_scanjoin_target_to_paths is the right place which can only create top node Gather.
> > So we change the flag in apply_scanjoin_target_to_paths around generate_useful_gather_paths to identify the top node.
Right. We wanted to ignore parallel tuple cost for only the upper Gather path.
> I was wondering actually if we need the state machine. Reason is that as
> AFAICS the code could be placed in create_gather_path, where you can
> also check if it is a top gather node, whether the dest receiver is the
> right type, etc? To me that seems like a nicer solution as its makes
> that all logic that decides whether or not a parallel CTAS is valid is
> in a single place instead of distributed over various places.
IMO, we can't determine the fact that we are going to generate the top
Gather path in create_gather_path. To decide on whether or not the top
Gather path generation, I think it's not only required to check the
root->query_level == 1 but we also need to rely on from where
generate_useful_gather_paths gets called. For instance, for
query_level 1, generate_useful_gather_paths gets called from 2 places
in apply_scanjoin_target_to_paths. Likewise, create_gather_path also
gets called from many places. IMO, the current way i.e. setting flag
it in apply_scanjoin_target_to_paths and ignoring based on that in
cost_gather seems safe.
I may be wrong. Thoughts?
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Dolgov | 2021-01-04 14:00:16 | Re: [HACKERS] [PATCH] Generic type subscripting |
Previous Message | Önder Kalacı | 2021-01-04 13:37:39 | Re: row filtering for logical replication |