Re: INSERT INTO SELECT, Why Parallelism is not selected?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT INTO SELECT, Why Parallelism is not selected?
Date: 2020-07-29 13:47:54
Message-ID: CA+TgmoaN6eLRixb59DpH1=msrsUDCmVCOmOQa_5qLYhNypwbRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 26, 2020 at 7:24 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> No, "git diff --check" doesn't help. I have tried pgindent but that
> also doesn't help neither was I expecting it to help. I am still not
> able to figure out how I goofed up this but will spend some more time
> on this. In the meantime, I have updated the patch to improve the
> comments as suggested by Robert. Do let me know if you want to
> edit/add something more?

I still don't agree with this as proposed.

+ * For now, we don't allow parallel inserts of any form not even where the
+ * leader can perform the insert. This restriction can be uplifted once
+ * we allow the planner to generate parallel plans for inserts. We can

If I'm understanding this correctly, this logic is completely
backwards. We don't prohibit inserts here because we know the planner
can't generate them. We prohibit inserts here because, if the planner
somehow did generate them, it wouldn't be safe. You're saying that
it's not allowed because we don't try to do it yet, but actually it's
not allowed because we want to make sure that we don't accidentally
try to do it. That's very different.

+ * parallelize inserts unless they generate a new commandid (ex. inserts
+ * into a table having foreign key column) or lock tuples (ex. statements
+ * like Insert .. Select For Update).

I understand the part about generating new command IDs, but not the
part about locking tuples. Why would that be a problem? Can it better
explained here?

Examples in comments are typically introduced with e.g., not ex.

+ * We should be able to parallelize
+ * the later case if we can ensure that no two parallel processes can ever
+ * operate on the same page.

I don't know whether this is talking about two processes operating on
the same page at the same time, or ever within a single query
execution. If it's the former, perhaps we need to explain why that's a
concern for parallel query but not otherwise; if it's the latter, that
seems impossible to guarantee and imagining that we'll ever be able to
do so seems like wishful thinking.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-07-29 13:53:55 Re: Making CASE error handling less surprising
Previous Message Julien Rouhaud 2020-07-29 13:40:16 Re: IDEA: pg_stat_statements tracking utility statements by tag?