RE: Parallel Inserts in CREATE TABLE AS

From: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Parallel Inserts in CREATE TABLE AS
Date: 2020-11-26 06:44:40
Message-ID: 30fa39b09b614f799df364f59307e57e@G08CNEXMBPEKD05.g08.fujitsu.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi ,

> On Thu, Nov 26, 2020 at 7:47 AM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
> wrote:
> >
> > Hi,
> >
> > I have an issue about the following code:
> >
> > econtext = node->ps.ps_ExprContext;
> > ResetExprContext(econtext);
> >
> > + if (ISCTAS(node->ps.intoclause))
> > + {
> > + ExecParallelInsertInCTAS(node);
> > + return NULL;
> > + }
> >
> > /* If no projection is required, we're done. */
> > if (node->ps.ps_ProjInfo == NULL)
> > return slot;
> >
> > /*
> > * Form the result tuple using ExecProject(), and return it.
> > */
> > econtext->ecxt_outertuple = slot;
> > return ExecProject(node->ps.ps_ProjInfo);
> >
> > It seems the projection will be skipped.
> > Is this because projection is not required in this case ?
> > (I'm not very familiar with where the projection will be.)
> >
>
> For parallel inserts in CTAS, I don't think we need to project the tuples
> being returned from the underlying plan nodes, and also we have nothing
> to project from the Gather node further up. The required projection will
> happen while the tuples are being returned from the underlying nodes and
> the projected tuples are being directly fed to CTAS's dest receiver
> intorel_receive(), from there into the created table. We don't need
> ExecProject again in ExecParallelInsertInCTAS().
>
> For instance, projection will always be done when the tuple is being returned
> from an underlying sequential scan node(see ExecScan() -->
> ExecProject() and this is true for both leader and workers. In both leader
> and workers, we are just calling CTAS's dest receiver intorel_receive().
>
> Thoughts?

I took a deep look at the projection logic.
In most cases, you are right that Gather node does not need projection.

In some rare cases, such as Subplan (or initplan I guess).
The projection will happen in Gather node.

The example:

Create table test(i int);
Create table test2(a int, b int);
insert into test values(generate_series(1,10000000,1));
insert into test2 values(generate_series(1,1000,1), generate_series(1,1000,1));

postgres=# explain(verbose, costs off) select test.i,(select i from (select * from test2) as tt limit 1) from test where test.i < 2000;
QUERY PLAN
----------------------------------------
Gather
Output: test.i, (SubPlan 1)
Workers Planned: 2
-> Parallel Seq Scan on public.test
Output: test.i
Filter: (test.i < 2000)
SubPlan 1
-> Limit
Output: (test.i)
-> Seq Scan on public.test2
Output: test.i

In this case, projection is necessary,
because the subplan will be executed in projection.

If skipped, the table created will loss some data.

Best regards,
houzj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Khandekar 2020-11-26 06:48:12 Re: Improving spin-lock implementation on ARM.
Previous Message Pavel Stehule 2020-11-26 06:42:33 Re: proposal: possibility to read dumped table's name from file