From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
---|---|
To: | Amit Langote <amitlangote09(at)gmail(dot)com> |
Cc: | Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: Determine parallel-safety of partition relations for Inserts |
Date: | 2021-01-18 03:04:38 |
Message-ID: | CAA4eK1JjT2AOMNUHHBVTeMBbvFObSveZWe5jE8bcvV5cd7yzoQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Jan 17, 2021 at 4:45 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
>
> On Sat, Jan 16, 2021 at 2:02 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > On Fri, Jan 15, 2021 at 6:45 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> > > On Fri, Jan 15, 2021 at 9:59 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > > We want to do this for Inserts where only Select can be parallel and
> > > > Inserts will always be done by the leader backend. This is actually
> > > > the case we first want to implement.
> > >
> > > Sorry, I haven't looked at the linked threads and the latest patches
> > > there closely enough yet, so I may be misreading this, but if the
> > > inserts will always be done by the leader backend as you say, then why
> > > does the planner need to be checking the parallel safety of the
> > > *target* table's expressions?
> > >
> >
> > The reason is that once we enter parallel-mode we can't allow
> > parallel-unsafe things (like allocation of new CIDs, XIDs, etc.). We
> > enter the parallel-mode at the beginning of the statement execution,
> > see ExecutePlan(). So, the Insert will be performed in parallel-mode
> > even though it happens in the leader backend. It is not possible that
> > we finish getting all the tuples from the gather node first and then
> > start inserting. Even, if we somehow find something to make this work
> > anyway the checks being discussed will be required to make inserts
> > parallel (where inserts will be performed by workers) which is
> > actually the next patch in the thread I mentioned in the previous
> > email.
> >
> > Does this answer your question?
>
> Yes, thanks for the explanation. I kind of figured that doing the
> insert part itself in parallel using workers would be a part of the
> end goal of this work, although that didn't come across immediately.
>
> It's a bit unfortunate that the parallel safety check of the
> individual partitions cannot be deferred until it's known that a given
> partition will be affected by the command at all. Will we need
> fundamental changes to how parallel query works to make that possible?
> If so, have such options been considered in these projects?
>
I think it is quite fundamental to how parallel query works and we
might not be able to change it due to various reasons like (a) it will
end up generating a lot of paths in optimizer when it is not safe to
do so and in the end, we won't use it. (b) If after inserting into a
few partitions we came to know that the next partition we are going to
insert has some parallel-unsafe constraints then we need to give up
the execution and restart the statement by again trying to first plan
it by having non-parallel paths. Now, we can optimize this by
retaining both parallel and non-parallel plans such that if we fail to
execute parallel-plan we can use a non-parallel plan to execute the
statement but still that doesn't seem like an advisable approach.
The extra time spent in optimizer will pay-off well by the parallel
execution. As pointer earlier, you can see one of the results shared
on the other thread [1]. The cases where it might not get the benefit
(say when the underlying plan is non-parallel) can have some impact
but still, we have not tested that in detail. The ideas we have
discussed so far to address that are (a) postpone parallel-safety
checks for partitions till there are some underneath partial paths
(from which parallel paths can be generated) but that has some
down-side in that we will end up generating partial paths when that is
really not required, (b) have a rel option like parallel_dml_workers
or use existing option parallel_workers to allow considering parallel
insert for a relation. Any better ideas?
> If such
> changes are not possible in the short term, like for v14, we should at
> least try to make sure that the eager checking of all partitions is
> only performed if using parallelism is possible at all.
>
As of now, we do first check if it is safe to generate a parallel plan
for underlying select (in Insert into .... Select ..) and then perform
parallel-safety checks for the DML. We can postpone it further as
suggested above in (a).
> I will try to take a look at the patches themselves to see if there's
> something I know that will help.
>
Thank you. It will be really helpful if you can do that.
--
With Regards,
Amit Kapila.
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Nancarrow | 2021-01-18 03:12:07 | Re: Parallel INSERT (INTO ... SELECT ...) |
Previous Message | torikoshia | 2021-01-18 03:00:22 | Re: adding wait_start column to pg_locks |