Re: Parallel INSERT (INTO ... SELECT ...)

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Cc: Greg Nancarrow <gregn4422(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject: Re: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-05 05:14:59
Message-ID: CA+HiwqEr4z_5iWd2L_nOxG=8S=imCYFum+_RmgE36UyK7grhKg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

While reviewing the v14 set of patches (will send my comments
shortly), I too had some reservations on how 0001 decided to go about
setting hasModifyingCTE.

On Fri, Feb 5, 2021 at 1:51 PM Hou, Zhijie <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com> wrote:
> > > I took a look into the hasModifyingCTE bugfix recently, and found a
> > > possible bug case without the parallel insert patch.
> > >
> > > ---------------------------------
> > > drop table if exists test_data1;
> > > create table test_data1(a int, b int) ; insert into test_data1 select
> > > generate_series(1,1000), generate_series(1,1000); set
> > > force_parallel_mode=on;
> > >
> > > CREATE TEMP TABLE bug6051 AS
> > > select i from generate_series(1,3) as i;
> > >
> > > SELECT * FROM bug6051;
> > > CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD select a as
> > > i from test_data1;
> > >
> > > WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051
> > > SELECT * FROM t1;
> > >
> > > *******
> > > ***ERROR: cannot assign XIDs during a parallel operation
> > > *******
> > > ---------------------------------
> > >
> > > I debugged it and it did have modifycte in the parsetree after rewrite.
> > > I think if we can properly set the hasModifyingCTE, we can avoid this
> > error by not consider parallel for this.
> > >
> >
> > Thanks. You've identified that the bug exists for SELECT too. I've verified
> > that the issue is fixed by the bugfix included in the Parallel INSERT patch.
> > Are you able to review my bugfix?
> > Since the problem exists for SELECT in the current Postgres code, I'd like
> > to pull that bugfix out and provide it as a separate fix.

+1, a separate patch for this seems better.

> > My concern is that there may well be a better way to fix the issue - for
> > example, during the re-writing, rather than after the query has been
> > re-written.
> Hi,
>
> I took a look at the fix and have some thoughts on it.
> (Please correct me if you have tried this idea and found something is wrong)
>
> IMO, the main reason for the hasModifyingCTE=false is that:
> the Rewriter did not update the hasModifyingCTE when copying the existsing 'cteList' to the rewrited one.
>
> It seems there is only one place where ctelist will be copied separately.
> -------
> static Query *
> rewriteRuleAction(Query *parsetree,
> ...
> /* OK, it's safe to combine the CTE lists */
> sub_action->cteList = list_concat(sub_action->cteList,
> copyObject(parsetree->cteList));
> + sub_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
> --------
>
> Based on the above, if we update the hasModifyingCTE here, we may solve this problem.
>
> And there is another point here, the sub_action may be not the final parsetree.
> If defined the rule like "DO INSTEAD insert into xx select xx from xx", Rewriter will
> Put the ctelist into subquery in parsetree's rtable.
> In this case, we may also need to update the final parsetree too.
> (I think you know this case, I found same logic in the latest patch)
>
> --------
> static Query *
> rewriteRuleAction(Query *parsetree,
> ...
> if (sub_action_ptr)
> + {
> *sub_action_ptr = sub_action;
> + rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
> + }
> --------
>
> And the Basic test passed.
> What do you think ?

That is very close to what I was going to suggest, which is this:

diff --git a/src/backend/rewrite/rewriteHandler.c
b/src/backend/rewrite/rewriteHandler.c
index 0672f497c6..3c4417af98 100644
--- a/src/backend/rewrite/rewriteHandler.c
+++ b/src/backend/rewrite/rewriteHandler.c
@@ -631,6 +631,8 @@ rewriteRuleAction(Query *parsetree,
checkExprHasSubLink((Node *) rule_action->returningList);
}

+ rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
+
return rule_action;
}

--
Amit Langote
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-05 05:22:54 Re: Single transaction in the tablesync worker?
Previous Message Hou, Zhijie 2021-02-05 04:51:19 RE: Parallel INSERT (INTO ... SELECT ...)