Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2020-10-09 09:09:29
Message-ID: CAA4eK1L9bR5N6M4Sjn1k--yvZ9Y2DBWNWUyjCJMzf1E0kX5=sw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 6, 2020 at 3:08 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> On Mon, Oct 5, 2020 at 10:36 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> Also, I have attached a separate patch (requested by Andres Freund)
> that just allows the underlying SELECT part of "INSERT INTO ... SELECT
> ..." to be parallel.
>

It might be a good idea to first just get this patch committed, if
possible. So, I have reviewed the latest version of this patch:

0001-InsertParallelSelect
1.
ParallelContext *pcxt;

+ /*
+ * We need to avoid an attempt on INSERT to assign a
+ * FullTransactionId whilst in parallel mode (which is in
+ * effect due to the underlying parallel query) - so the
+ * FullTransactionId is assigned here. Parallel mode must
+ * be temporarily escaped in order for this to be possible.
+ * The FullTransactionId will be included in the transaction
+ * state that is serialized in the parallel DSM.
+ */
+ if (estate->es_plannedstmt->commandType == CMD_INSERT)
+ {
+ Assert(IsInParallelMode());
+ ExitParallelMode();
+ GetCurrentFullTransactionId();
+ EnterParallelMode();
+ }
+

This looks like a hack to me. I think you are doing this to avoid the
parallel mode checks in GetNewTransactionId(), right? If so, I have
already mentioned above [1] that we can change it so that we disallow
assigning xids for parallel workers only. The same is true for the
check in ExecGatherMerge. Do you see any problem with that suggestion?

2.
@@ -337,7 +337,7 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
*/
if ((cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
IsUnderPostmaster &&
- parse->commandType == CMD_SELECT &&
+ (parse->commandType == CMD_SELECT || parse->commandType == CMD_INSERT) &&
!parse->hasModifyingCTE &&
max_parallel_workers_per_gather > 0 &&
!IsParallelWorker())

I think the comments above this need to be updated especially the part
where we says:"Note that we do allow CREATE TABLE AS, SELECT INTO, and
CREATE MATERIALIZED VIEW to use parallel plans, but as of now, only
the leader backend writes into a completely new table.". Don't we need
to include Insert also?

3.
@@ -371,6 +371,7 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
* parallel-unsafe, or else the query planner itself has a bug.
*/
glob->parallelModeNeeded = glob->parallelModeOK &&
+ (parse->commandType == CMD_SELECT) &&
(force_parallel_mode != FORCE_PARALLEL_OFF);

Why do you need this change? The comments above this code should be
updated to reflect this change. I think for the same reason the below
code seems to be modified but I don't understand the reason for the
below change as well, also it is better to update the comments for
this as well.

@@ -425,7 +426,7 @@ standard_planner(Query *parse, const char
*query_string, int cursorOptions,
* Optionally add a Gather node for testing purposes, provided this is
* actually a safe thing to do.
*/
- if (force_parallel_mode != FORCE_PARALLEL_OFF && top_plan->parallel_safe)
+ if (force_parallel_mode != FORCE_PARALLEL_OFF && parse->commandType
== CMD_SELECT && top_plan->parallel_safe)
{
Gather *gather = makeNode(Gather);

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BE-pM0U6qw7EOF0yO0giTxdErxoJV9xTqN%2BLo9zdotFQ%40mail.gmail.com

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2020-10-09 09:22:22 Re: Parallel copy
Previous Message Greg Nancarrow 2020-10-09 09:06:33 Re: Parallel INSERT (INTO ... SELECT ...)