RE: Parallel INSERT (INTO ... SELECT ...)

From: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
To: Greg Nancarrow <gregn4422(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, "Amit Langote" <amitlangote09(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-05 04:51:19
Message-ID: e52dc77ad0994aafbb30cb716418d134@G08CNEXMBPEKD05.g08.fujitsu.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > I took a look into the hasModifyingCTE bugfix recently, and found a
> > possible bug case without the parallel insert patch.
> >
> > ---------------------------------
> > drop table if exists test_data1;
> > create table test_data1(a int, b int) ; insert into test_data1 select
> > generate_series(1,1000), generate_series(1,1000); set
> > force_parallel_mode=on;
> >
> > CREATE TEMP TABLE bug6051 AS
> > select i from generate_series(1,3) as i;
> >
> > SELECT * FROM bug6051;
> > CREATE RULE bug6051_ins AS ON INSERT TO bug6051 DO INSTEAD select a as
> > i from test_data1;
> >
> > WITH t1 AS ( DELETE FROM bug6051 RETURNING * ) INSERT INTO bug6051
> > SELECT * FROM t1;
> >
> > *******
> > ***ERROR: cannot assign XIDs during a parallel operation
> > *******
> > ---------------------------------
> >
> > I debugged it and it did have modifycte in the parsetree after rewrite.
> > I think if we can properly set the hasModifyingCTE, we can avoid this
> error by not consider parallel for this.
> >
>
> Thanks. You've identified that the bug exists for SELECT too. I've verified
> that the issue is fixed by the bugfix included in the Parallel INSERT patch.
> Are you able to review my bugfix?
> Since the problem exists for SELECT in the current Postgres code, I'd like
> to pull that bugfix out and provide it as a separate fix.
> My concern is that there may well be a better way to fix the issue - for
> example, during the re-writing, rather than after the query has been
> re-written.
Hi,

I took a look at the fix and have some thoughts on it.
(Please correct me if you have tried this idea and found something is wrong)

IMO, the main reason for the hasModifyingCTE=false is that:
the Rewriter did not update the hasModifyingCTE when copying the existsing 'cteList' to the rewrited one.

It seems there is only one place where ctelist will be copied separately.
-------
static Query *
rewriteRuleAction(Query *parsetree,
...
/* OK, it's safe to combine the CTE lists */
sub_action->cteList = list_concat(sub_action->cteList,
copyObject(parsetree->cteList));
+ sub_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
--------

Based on the above, if we update the hasModifyingCTE here, we may solve this problem.

And there is another point here, the sub_action may be not the final parsetree.
If defined the rule like "DO INSTEAD insert into xx select xx from xx", Rewriter will
Put the ctelist into subquery in parsetree's rtable.
In this case, we may also need to update the final parsetree too.
(I think you know this case, I found same logic in the latest patch)

--------
static Query *
rewriteRuleAction(Query *parsetree,
...
if (sub_action_ptr)
+ {
*sub_action_ptr = sub_action;
+ rule_action->hasModifyingCTE |= parsetree->hasModifyingCTE;
+ }
--------

And the Basic test passed.
What do you think ?

Best regards,
houzj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2021-02-05 05:14:59 Re: Parallel INSERT (INTO ... SELECT ...)
Previous Message Thomas Munro 2021-02-05 04:47:16 Re: shared tempfile was not removed on statement_timeout