From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com> |
Subject: | Re: Parallel Inserts in CREATE TABLE AS |
Date: | 2020-10-14 12:46:35 |
Message-ID: | CAA4eK1KWSVYn8hxnTKZfcKiHxhqmj842V1tK9f88mYMCGgGxtQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 14, 2020 at 2:46 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Tue, Oct 6, 2020 at 10:58 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> >
> > While skimming through the patch, a small thing I noticed:
> > + /*
> > + * SELECT part of the CTAS is parallelizable, so we can make
> > + * each parallel worker insert the tuples that are resulted
> > + * in it's execution into the target table.
> > + */
> > + if (!is_matview &&
> > + IsA(plan->planTree, Gather))
> > + ((DR_intorel *) dest)->is_parallel = true;
> > +
> >
> > I am not sure at this stage if this is the best way to make CTAS as
> > parallel but if so, then probably you can expand the comments a bit to
> > say why you consider only Gather node (and that too when it is the
> > top-most node) and why not another parallel node like GatherMerge?
> >
>
> If somebody expects to preserve the order of the tuples that are
> coming from GatherMerge node of the select part in CTAS or SELECT INTO
> while inserting, now if parallelism is allowed, that may not be the
> case i.e. the order of insertion of tuples may vary. I'm not quite
> sure, if someone wants to use order by in the select parts of CTAS or
> SELECT INTO in a real world use case. Thoughts?
>
I think there is no reason why one can't use ORDER BY in the
statements we are talking about here. But, I think we can't enable
parallelism for GatherMerge is because for that node we always need to
fetch the data in the leader backend to perform the final merge phase.
So, I was expecting a small comment saying something on those lines.
>
> >
> > Need to analyze whether to allow parallelism if CTAS has prepared statements or with no data.
> >
>
> For prepared statements, the parallelism will not be picked and so is
> parallel insertion.
>
Hmm, I am not sure what makes you say this statement. The parallelism
is enabled for prepared statements since commit 57a6a72b6b.
>
> I'm listing the things that are still pending.
>
> 1. How to represent the parallel insert for CTAS in explain plans? The
> explain CTAS shows the plan for only the SELECT part. How about having
> some textual info along with the Gather node? I'm not quite sure on
> this point, any suggestions are welcome.
>
I am also not sure about this point because we don't display anything
for the DDL part in explain. Can you propose by showing some example
of what you have in mind?
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | vignesh C | 2020-10-14 12:54:39 | Re: Parallel copy |
Previous Message | Ajin Cherian | 2020-10-14 12:44:05 | Re: [HACKERS] logical decoding of two-phase transactions |