From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: INSERT INTO SELECT, Why Parallelism is not selected? |
Date: | 2020-08-18 13:38:55 |
Message-ID: | CAA4eK1Ks8Sqs29VHPS6koNj5E9YQdkGCzgGsSrQMeUbQfe28yg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 18, 2020 at 1:37 PM Bharath Rupireddy
<bharath(dot)rupireddyforpostgres(at)gmail(dot)com> wrote:
>
> On Tue, Jul 14, 2020 at 1:20 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Mon, Jul 13, 2020 at 4:23 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > > I think we can do more than this by
> > > parallelizing the Insert part of this query as well as we have lifted
> > > group locking restrictions related to RelationExtension and Page lock
> > > in PG13. It would be really cool to do that unless we see any
> > > fundamental problems with it.
> >
> > +1, I think it will be cool to support for insert part here as well as
> > insert part in 'Create Table As Select..' as well.
> >
>
> +1 to parallelize inserts. Currently, ExecInsert() and CTAS use
> table_tuple_insert(), if we parallelize these parts, each worker will
> be inserting it's tuples(one tuple at a time) into the same data page,
> until space is available, if not a new data page can be obtained by
> any of the worker, others might start inserting into it. This way,
> will there be lock contention on data pages?
>
It is possible but we need to check how much that is a bottleneck
because that should not be a big part of the operation. And, it won't
be any worse than inserts via multiple backends. I think it is
important to do that way, otherwise, some of the pages can remain
half-empty.
Right now, the plan for Insert ... Select is like
Insert on <tbl_x>
-> Seq Scan on <tbl_y>
....
In the above the scan could be index scan as well. What we want is:
Gather
-> Insert on <tbl_x>
-> Seq Scan on <tbl_y>
....
>. Do we also need to make
> inserts to use table_multi_insert() (like the way "COPY" uses) instead
> of table_tuple_insert()?
>
I am not sure at this stage but if it turns out to be a big problem
then we might think of inventing some way to allow individual workers
to operate on different pages. I think even without that we should be
able to make a big gain as reads, filtering, etc can be done in
parallel.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2020-08-18 13:54:30 | Re: Creating a function for exposing memory usage of backend process |
Previous Message | Stephen Frost | 2020-08-18 13:18:03 | Re: use pg_get_functiondef() in pg_dump |