Quick Links

Re: INSERT INTO SELECT, Why Parallelism is not selected?

From:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
Cc:	Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: INSERT INTO SELECT, Why Parallelism is not selected?
Date:	2020-08-18 13:38:55
Message-ID:	CAA4eK1Ks8Sqs29VHPS6koNj5E9YQdkGCzgGsSrQMeUbQfe28yg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, Aug 18, 2020 at 1:37 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Tue, Jul 14, 2020 at 1:20 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Mon, Jul 13, 2020 at 4:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > I think we can do more than this by
> > > parallelizing the Insert part of this query as well as we have lifted
> > > group locking restrictions related to RelationExtension and Page lock
> > > in PG13. It would be really cool to do that unless we see any
> > > fundamental problems with it.
> >
> > +1, I think it will be cool to support for insert part here as well as
> > insert part in 'Create Table As Select..' as well.
> >
>
> +1 to parallelize inserts. Currently, ExecInsert() and CTAS use
> table_tuple_insert(), if we parallelize these parts, each worker will
> be inserting it's tuples(one tuple at a time) into the same data page,
> until space is available, if not a new data page can be obtained by
> any of the worker, others might start inserting into it. This way,
> will there be lock contention on data pages?
>

It is possible but we need to check how much that is a bottleneck
because that should not be a big part of the operation. And, it won't
be any worse than inserts via multiple backends. I think it is
important to do that way, otherwise, some of the pages can remain
half-empty.

Right now, the plan for Insert ... Select is like
Insert on <tbl_x>
-> Seq Scan on <tbl_y>
....

In the above the scan could be index scan as well. What we want is:
Gather
-> Insert on <tbl_x>
-> Seq Scan on <tbl_y>
....

>. Do we also need to make
> inserts to use table_multi_insert() (like the way "COPY" uses) instead
> of table_tuple_insert()?
>

I am not sure at this stage but if it turns out to be a big problem
then we might think of inventing some way to allow individual workers
to operate on different pages. I think even without that we should be
able to make a big gain as reads, filtering, etc can be done in
parallel.

--
With Regards,
Amit Kapila.

In response to

Re: INSERT INTO SELECT, Why Parallelism is not selected? at 2020-08-18 08:06:47 from Bharath Rupireddy

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Fujii Masao	2020-08-18 13:54:30	Re: Creating a function for exposing memory usage of backend process
Previous Message	Stephen Frost	2020-08-18 13:18:03	Re: use pg_get_functiondef() in pg_dump