RE: Parallel Inserts in CREATE TABLE AS

From: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Parallel Inserts in CREATE TABLE AS
Date: 2020-11-27 06:27:34
Message-ID: 3129fda75d9c4e44ba13acd5f0ce8e88@G08CNEXMBPEKD05.g08.fujitsu.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

> > I took a deep look at the projection logic.
> > In most cases, you are right that Gather node does not need projection.
> >
> > In some rare cases, such as Subplan (or initplan I guess).
> > The projection will happen in Gather node.
> >
> > The example:
> >
> > Create table test(i int);
> > Create table test2(a int, b int);
> > insert into test values(generate_series(1,10000000,1));
> > insert into test2 values(generate_series(1,1000,1),
> > generate_series(1,1000,1));
> >
> > postgres=# explain(verbose, costs off) select test.i,(select i from
> (select * from test2) as tt limit 1) from test where test.i < 2000;
> > QUERY PLAN
> > ----------------------------------------
> > Gather
> > Output: test.i, (SubPlan 1)
> > Workers Planned: 2
> > -> Parallel Seq Scan on public.test
> > Output: test.i
> > Filter: (test.i < 2000)
> > SubPlan 1
> > -> Limit
> > Output: (test.i)
> > -> Seq Scan on public.test2
> > Output: test.i
> >
> > In this case, projection is necessary, because the subplan will be
> > executed in projection.
> >
> > If skipped, the table created will loss some data.
> >
>
> Thanks a lot for the use case. Yes with the current patch table will lose
> data related to the subplan. On analyzing further, I think we can not allow
> parallel inserts in the cases when the Gather node has some projections
> to do. Because the workers can not perform that projection. So, having
> ps_ProjInfo in the Gather node is an indication for us to disable parallel
> inserts and only the leader can do the insertions after the Gather node
> does the required projections.
>
> Thoughts?
>

Agreed.

2.
@@ -166,6 +228,16 @@ ExecGather(PlanState *pstate)
{
ParallelContext *pcxt;

+ /*
+ * Take the necessary information to be passed to workers for
+ * parallel inserts in CTAS.
+ */
+ if (ISCTAS(node->ps.intoclause))
+ {
+ node->ps.lefttree->intoclause = node->ps.intoclause;
+ node->ps.lefttree->objectid = node->ps.objectid;
+ }
+
/* Initialize, or re-initialize, shared state needed by workers. */
if (!node->pei)
node->pei = ExecInitParallelPlan(node->ps.lefttree,

I found the code pass intoclause and objectid to Gather node's lefttree.
Is it necessary? It seems only Gather node will use the information.

Best regards,
houzj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-11-27 06:38:43 Re: Disable WAL logging to speed up data loading
Previous Message Masahiko Sawada 2020-11-27 06:07:34 Re: Disable WAL logging to speed up data loading