From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com> |
Subject: | Re: Parallel Inserts in CREATE TABLE AS |
Date: | 2020-12-07 10:35:50 |
Message-ID: | CAA4eK1+Yyiu2ecXbVfumK3ZvELV1G5fmWP1V0S0YuewHD9nSQg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Dec 7, 2020 at 3:44 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Mon, Dec 7, 2020 at 2:55 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Mon, Dec 7, 2020 at 11:32 AM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
> > >
> > > + if (!(root->parse->isForCTAS &&
> > > + root->query_level == 1))
> > > + run_cost += parallel_tuple_cost * path->path.rows;
> > >
> > > I noticed that the parallel_tuple_cost will still be ignored,
> > > When Gather is not the top node.
> > >
> > > Example:
> > > Create table test(i int);
> > > insert into test values(generate_series(1,10000000,1));
> > > explain create table ntest3 as select * from test where i < 200 limit 10000;
> > >
> > > QUERY PLAN
> > > -------------------------------------------------------------------------------
> > > Limit (cost=1000.00..97331.33 rows=1000 width=4)
> > > -> Gather (cost=1000.00..97331.33 rows=1000 width=4)
> > > Workers Planned: 2
> > > -> Parallel Seq Scan on test (cost=0.00..96331.33 rows=417 width=4)
> > > Filter: (i < 200)
> > >
> > >
> > > The isForCTAS will be true because [create table as], the
> > > query_level is always 1 because there is no subquery.
> > > So even if gather is not the top node, parallel cost will still be ignored.
> > >
> > > Is that works as expected ?
> > >
> >
> > I don't think that is expected and is not the case without this patch.
> > The cost shouldn't be changed for existing cases where the write is
> > not pushed to workers.
> >
>
> Thanks for pointing that out. Yes it should not change for the cases
> where parallel inserts will not be picked later.
>
> Any better suggestions on how to make the planner consider that the
> CTAS might choose parallel inserts later at the same time avoiding the
> above issue in case it doesn't?
>
What is the need of checking query_level when 'isForCTAS' is set only
when Gather is a top-node?
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Bharath Rupireddy | 2020-12-07 10:42:38 | Re: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently |
Previous Message | Peter Eisentraut | 2020-12-07 10:28:53 | Re: Add primary keys to system catalogs |