Re: Parallel Inserts in CREATE TABLE AS

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-15 09:20:14
Message-ID: CAFiTN-tiM+A5fL6fXuX9xq7MTsni1zb+8aRNkJq15BbFPpA8fw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 15, 2020 at 2:06 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Mon, Dec 14, 2020 at 6:08 PM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
> > Currently with the patch, we can allow parallel CTAS when topnode is Gather.
> > When top-node is Append and Gather is the sub-node of Append, I think we can still enable
> > Parallel CTAS by pushing Parallel CTAS down to the sub-node Gather, such as:
> >
> > Append
> > ------>Gather
> > --------->Create table
> > ------------->Seqscan
> > ------>Gather
> > --------->create table
> > ------------->Seqscan
> >
> > And the use case seems common to me, such as:
> > select * from A where xxx union all select * from B where xxx;
>
> Thanks for the append use case.
>
> Here's my analysis on pushing parallel inserts down even in case the
> top node is Append.
>
> For union cases which need to remove duplicate tuples, we can't push
> the inserts or CTAS dest receiver down. If I'm not wrong, Append node
> is not doing duplicate removal(??), I saw that it's the HashAggregate
> node (which is the top node that removes the duplicate tuples). And
> also for except/except all/intersect/intersect all cases we receive
> HashSetOp nodes on top of Append. So for both cases, our check for
> Gather or Append at the top node is enough to detect this to not allow
> parallel inserts.
>
> For union all:
> case 1: We can push the CTAS dest receiver to each Gather node
> Append
> ->Gather
> ->Parallel Seq Scan
> ->Gather
> ->Parallel Seq Scan
> ->Gather
> ->Parallel Seq Scan
>
> case 2: We can still push the CTAS dest receiver to each Gather node.
> Non-Gather nodes will do inserts as they do now i.e. by sending tuples
> to Append and from there to CTAS dest receiver.
> Append
> ->Gather
> ->Parallel Seq Scan
> ->Seq Scan / Join / any other non-Gather node
> ->Gather
> ->Parallel Seq Scan
> ->Seq Scan / Join / any other non-Gather node
>
> case 3: We can push the CTAS dest receiver to Gather
> Gather
> ->Parallel Append
> ->Parallel Seq Scan
> ->Parallel Seq Scan
>
> case 4: We can push the CTAS dest receiver to Gather
> Gather
> ->Parallel Append
> ->Parallel Seq Scan
> ->Parallel Seq Scan
> ->Seq Scan / Join / any other non-Gather node
>
> Please let me know if I'm missing any other possible use case.
>
> Thoughts?

Your analysis looks right to me.

> > I attach a WIP patch which just show the possibility of this feature.
> > The patch is based on the latest v11-patch.
> >
> > What do you think?
>
> As suggested by Amit earlier, I kept the 0001 patch(so far) such that
> it doesn't have the code to influence the planner to consider parallel
> tuple cost as 0. It works on the plan whatever gets generated and
> decides to allow parallel inserts or not. And in the 0002 patch, I
> added the code for influencing the planner to consider parallel tuple
> cost as 0. Maybe we can have a 0003 patch for tests alone.

Yeah, that makes sense and it will be easy for the review.

> Once we are okay with the above analysis and use cases, we can
> incorporate the Append changes to respective patches.
>
> Hope that's okay.

Make sense to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2020-12-15 09:22:44 Re: Minor documentation error regarding streaming replication protocol
Previous Message Greg Nancarrow 2020-12-15 09:04:55 Re: libpq debug log