Re: Parallel Inserts in CREATE TABLE AS

From: Zhihong Yu <zyu(at)yugabyte(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Luc Vlaming <luc(at)swarm64(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2021-01-07 04:23:14
Message-ID: CALNJ-vTXr_y0cA6kYDtTC0fTnRhdgWKyq48TOmczzRzTUfJo8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for the clarification.

w.r.t. the commit message, maybe I was momentarily sidetracked by the
phrase: With this change.
It seems the scenarios you listed are known parallel safety constraints.

Probably rephrase that sentence a little bit to make this clearer.

Cheers

On Wed, Jan 6, 2021 at 8:01 PM Bharath Rupireddy <
bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:

> On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
> >
> > Hi,
> > For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :
> >
> > workers to Gather node to 0. With this change, there are chances
> > that the planner may choose the parallel plan.
> >
> > It would be nice if the scenarios where a parallel plan is not chosen
> are listed.
>
> There are many reasons, the planner may not choose a parallel plan for
> the select part, for instance if there are temporary tables, parallel
> unsafe functions, or the parallelism GUCs are not set properly,
> foreign tables and so on. see
> https://www.postgresql.org/docs/devel/parallel-safety.html. I don't
> think so, we will add all the scenarios into the commit message.
>
> Having said that, we have extensive comments in the code(especially in
> the function SetParallelInsertState) about when parallel inserts are
> chosen.
>
> + * Parallel insertions are possible only if the upper node is Gather.
> */
> + if (!IsA(gstate, GatherState))
> return;
>
> + * Parallelize inserts only when the upper Gather node has no
> projections.
> */
> + if (!gstate->ps.ps_ProjInfo)
> + {
> + /* Okay to parallelize inserts, so mark it. */
> + if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> + ((DR_intorel *) dest)->is_parallel = true;
> +
> + /*
> + * For parallelizing inserts, we must send some information so
> that the
> + * workers can build their own dest receivers. For CTAS, this
> info is
> + * into clause, object id (to open the created table).
> + *
> + * Since the required information is available in the dest
> receiver,
> + * store a reference to it in the Gather state so that it will be
> used
> + * in ExecInitParallelPlan to pick the information.
> + */
> + gstate->dest = dest;
> + }
> + else
> + {
> + /*
> + * Upper Gather node has projections, so parallel insertions are
> not
> + * allowed.
> + */
>
> > + if ((root->parse->parallelInsCmdTupleCostOpt &
> > + PARALLEL_INSERT_SELECT_QUERY) &&
> > + (root->parse->parallelInsCmdTupleCostOpt &
> > + PARALLEL_INSERT_CAN_IGN_TUP_COST))
> > + {
> > + /* We are ignoring the parallel tuple cost, so mark it. */
> > + root->parse->parallelInsCmdTupleCostOpt |=
> > +
> PARALLEL_INSERT_TUP_COST_IGNORED;
> >
> > If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and
> PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED
> is implied.
> >
> > Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the
> setting (1) of the first two bits should suffice.
>
> The way these flags work is as follows: before planning in CTAS, we
> set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper
> gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we
> actually ignored the tuple cost in cost_gather we set
> PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set
> PARALLEL_INSERT_CAN_IGN_TUP_COST before calling
> generate_useful_gather_paths, and the function
> generate_useful_gather_paths can return before reaching cost_gather,
> see below snippets. So, we need the 3 flags.
>
> void
> generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
> override_rows)
> {
> ListCell *lc;
> double rows;
> double *rowsp = NULL;
> List *useful_pathkeys_list = NIL;
> Path *cheapest_partial_path = NULL;
>
> /* If there are no partial paths, there's nothing to do here. */
> if (rel->partial_pathlist == NIL)
> return;
>
> /* Should we override the rel's rowcount estimate? */
> if (override_rows)
> rowsp = &rows;
>
> /* generate the regular gather (merge) paths */
> generate_gather_paths(root, rel, override_rows);
>
> void
> generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
> override_rows)
> {
> Path *cheapest_partial_path;
> Path *simple_gather_path;
> ListCell *lc;
> double rows;
> double *rowsp = NULL;
>
> /* If there are no partial paths, there's nothing to do here. */
> if (rel->partial_pathlist == NIL)
> return;
>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-01-07 04:30:32 Re: [PATCH] Feature improvement for CLOSE, FETCH, MOVE tab completion
Previous Message Amit Kapila 2021-01-07 04:23:05 Re: Single transaction in the tablesync worker?