From: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
---|---|
To: | Zhihong Yu <zyu(at)yugabyte(dot)com> |
Cc: | Luc Vlaming <luc(at)swarm64(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Inserts in CREATE TABLE AS |
Date: | 2021-01-07 04:01:44 |
Message-ID: | CALj2ACXcCHNwwkSfj4Zuf8pP2AFS0eK-XKLS-0sY4okhWBgRXA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jan 7, 2021 at 5:12 AM Zhihong Yu <zyu(at)yugabyte(dot)com> wrote:
>
> Hi,
> For v20-0002-Tuple-Cost-Adjustment-for-Parallel-Inserts-in-CTAS.patch :
>
> workers to Gather node to 0. With this change, there are chances
> that the planner may choose the parallel plan.
>
> It would be nice if the scenarios where a parallel plan is not chosen are listed.
There are many reasons, the planner may not choose a parallel plan for
the select part, for instance if there are temporary tables, parallel
unsafe functions, or the parallelism GUCs are not set properly,
foreign tables and so on. see
https://www.postgresql.org/docs/devel/parallel-safety.html. I don't
think so, we will add all the scenarios into the commit message.
Having said that, we have extensive comments in the code(especially in
the function SetParallelInsertState) about when parallel inserts are
chosen.
+ * Parallel insertions are possible only if the upper node is Gather.
*/
+ if (!IsA(gstate, GatherState))
return;
+ * Parallelize inserts only when the upper Gather node has no projections.
*/
+ if (!gstate->ps.ps_ProjInfo)
+ {
+ /* Okay to parallelize inserts, so mark it. */
+ if (ins_cmd == PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
+ ((DR_intorel *) dest)->is_parallel = true;
+
+ /*
+ * For parallelizing inserts, we must send some information so that the
+ * workers can build their own dest receivers. For CTAS, this info is
+ * into clause, object id (to open the created table).
+ *
+ * Since the required information is available in the dest receiver,
+ * store a reference to it in the Gather state so that it will be used
+ * in ExecInitParallelPlan to pick the information.
+ */
+ gstate->dest = dest;
+ }
+ else
+ {
+ /*
+ * Upper Gather node has projections, so parallel insertions are not
+ * allowed.
+ */
> + if ((root->parse->parallelInsCmdTupleCostOpt &
> + PARALLEL_INSERT_SELECT_QUERY) &&
> + (root->parse->parallelInsCmdTupleCostOpt &
> + PARALLEL_INSERT_CAN_IGN_TUP_COST))
> + {
> + /* We are ignoring the parallel tuple cost, so mark it. */
> + root->parse->parallelInsCmdTupleCostOpt |=
> + PARALLEL_INSERT_TUP_COST_IGNORED;
>
> If I read the code correctly, when both PARALLEL_INSERT_SELECT_QUERY and PARALLEL_INSERT_CAN_IGN_TUP_COST are set, PARALLEL_INSERT_TUP_COST_IGNORED is implied.
>
> Maybe we don't need the PARALLEL_INSERT_TUP_COST_IGNORED enum - the setting (1) of the first two bits should suffice.
The way these flags work is as follows: before planning in CTAS, we
set PARALLEL_INSERT_SELECT_QUERY, before we go for generating upper
gather path we set PARALLEL_INSERT_CAN_IGN_TUP_COST, and when we
actually ignored the tuple cost in cost_gather we set
PARALLEL_INSERT_TUP_COST_IGNORED. There are chances that we set
PARALLEL_INSERT_CAN_IGN_TUP_COST before calling
generate_useful_gather_paths, and the function
generate_useful_gather_paths can return before reaching cost_gather,
see below snippets. So, we need the 3 flags.
void
generate_useful_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool
override_rows)
{
ListCell *lc;
double rows;
double *rowsp = NULL;
List *useful_pathkeys_list = NIL;
Path *cheapest_partial_path = NULL;
/* If there are no partial paths, there's nothing to do here. */
if (rel->partial_pathlist == NIL)
return;
/* Should we override the rel's rowcount estimate? */
if (override_rows)
rowsp = &rows;
/* generate the regular gather (merge) paths */
generate_gather_paths(root, rel, override_rows);
void
generate_gather_paths(PlannerInfo *root, RelOptInfo *rel, bool override_rows)
{
Path *cheapest_partial_path;
Path *simple_gather_path;
ListCell *lc;
double rows;
double *rowsp = NULL;
/* If there are no partial paths, there's nothing to do here. */
if (rel->partial_pathlist == NIL)
return;
With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2021-01-07 04:09:27 | Re: plpgsql variable assignment with union is broken |
Previous Message | Tang, Haiying | 2021-01-07 03:57:44 | RE: [Patch] Optimize dropping of relation buffers using dlist |