From: | "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RE: Parallel Inserts in CREATE TABLE AS |
Date: | 2020-11-27 06:27:34 |
Message-ID: | 3129fda75d9c4e44ba13acd5f0ce8e88@G08CNEXMBPEKD05.g08.fujitsu.local |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
> > I took a deep look at the projection logic.
> > In most cases, you are right that Gather node does not need projection.
> >
> > In some rare cases, such as Subplan (or initplan I guess).
> > The projection will happen in Gather node.
> >
> > The example:
> >
> > Create table test(i int);
> > Create table test2(a int, b int);
> > insert into test values(generate_series(1,10000000,1));
> > insert into test2 values(generate_series(1,1000,1),
> > generate_series(1,1000,1));
> >
> > postgres=# explain(verbose, costs off) select test.i,(select i from
> (select * from test2) as tt limit 1) from test where test.i < 2000;
> > QUERY PLAN
> > ----------------------------------------
> > Gather
> > Output: test.i, (SubPlan 1)
> > Workers Planned: 2
> > -> Parallel Seq Scan on public.test
> > Output: test.i
> > Filter: (test.i < 2000)
> > SubPlan 1
> > -> Limit
> > Output: (test.i)
> > -> Seq Scan on public.test2
> > Output: test.i
> >
> > In this case, projection is necessary, because the subplan will be
> > executed in projection.
> >
> > If skipped, the table created will loss some data.
> >
>
> Thanks a lot for the use case. Yes with the current patch table will lose
> data related to the subplan. On analyzing further, I think we can not allow
> parallel inserts in the cases when the Gather node has some projections
> to do. Because the workers can not perform that projection. So, having
> ps_ProjInfo in the Gather node is an indication for us to disable parallel
> inserts and only the leader can do the insertions after the Gather node
> does the required projections.
>
> Thoughts?
>
Agreed.
2.
@@ -166,6 +228,16 @@ ExecGather(PlanState *pstate)
{
ParallelContext *pcxt;
+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (ISCTAS(node->ps.intoclause))
+ {
+ node->ps.lefttree->intoclause = node->ps.intoclause;
+ node->ps.lefttree->objectid = node->ps.objectid;
+ }
+
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
node->pei = ExecInitParallelPlan(node->ps.lefttree,
I found the code pass intoclause and objectid to Gather node's lefttree.
Is it necessary? It seems only Gather node will use the information.
Best regards,
houzj
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2020-11-27 06:38:43 | Re: Disable WAL logging to speed up data loading |
Previous Message | Masahiko Sawada | 2020-11-27 06:07:34 | Re: Disable WAL logging to speed up data loading |