Re: Parallel Inserts in CREATE TABLE AS

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2020-12-24 04:55:06
Message-ID: CALDaNm1XpwEoHS9U_zZ2GPSZ_qKZAc=VSa4VTO66J6k5Gzr=8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 22, 2020 at 2:16 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Tue, Dec 22, 2020 at 12:32 PM Bharath Rupireddy
> Attaching v14 patch set that has above changes. Please consider this
> for further review.
>

Few comments:
In the below case, should create be above Gather?
postgres=# explain create table t7 as select * from t6;
QUERY PLAN
-------------------------------------------------------------------
Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Create t7
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)

Can we change it to something like:
-------------------------------------------------------------------
Create t7
-> Gather (cost=0.00..9.17 rows=0 width=4)
Workers Planned: 2
-> Parallel Seq Scan on t6 (cost=0.00..9.17 rows=417 width=4)
(4 rows)

You could change intoclause_len = strlen(intoclausestr) to
strlen(intoclausestr) + 1 and use intoclause_len in the remaining
places. We can avoid the +1 in the other places.
+ /* Estimate space for into clause for CTAS. */
+ if (IS_CTAS(intoclause) && OidIsValid(objectid))
+ {
+ intoclausestr = nodeToString(intoclause);
+ intoclause_len = strlen(intoclausestr);
+ shm_toc_estimate_chunk(&pcxt->estimator, intoclause_len + 1);
+ shm_toc_estimate_keys(&pcxt->estimator, 1);
+ }

Can we use node->nworkers_launched == 0 in place of
node->need_to_scan_locally, that way the setting and resetting of
node->need_to_scan_locally can be removed. Unless need_to_scan_locally
is needed in any of the functions that gets called.
+ /* Enable leader to insert in case no parallel workers were launched. */
+ if (node->nworkers_launched == 0)
+ node->need_to_scan_locally = true;
+
+ /*
+ * By now, for parallel workers (if launched any), would have
started their
+ * work i.e. insertion to target table. In case the leader is chosen to
+ * participate for parallel inserts in CTAS, then finish its
share before
+ * going to wait for the parallel workers to finish.
+ */
+ if (node->need_to_scan_locally)
+ {

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Li Japin 2020-12-24 05:32:05 Cannot ship records to subscriber for partition tables using logical replication (publish_via_partition_root=false)
Previous Message Michael Paquier 2020-12-24 04:23:40 Re: Fail Fast In CTAS/CMV If Relation Already Exists To Avoid Unnecessary Rewrite, Planning Costs