From: | vignesh C <vignesh21(at)gmail(dot)com> |
---|---|
To: | Greg Nancarrow <gregn4422(at)gmail(dot)com> |
Cc: | Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel INSERT (INTO ... SELECT ...) |
Date: | 2021-01-08 09:24:56 |
Message-ID: | CALDaNm3SG=D=2UHtOtzAG8i+AbUxsX6Z371c-bJ-X9h7xw5CYg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Dec 11, 2020 at 4:30 PM Greg Nancarrow <gregn4422(at)gmail(dot)com> wrote:
>
> Posting an updated set of patches to address recent feedback:
>
> - Removed conditional-locking code used in parallel-safety checking
> code (Tsunakawa-san feedback). It turns out that for the problem test
> case, no parallel-safety checking should be occurring that locks
> relations because those inserts are specifying VALUES, not an
> underlying SELECT, so the parallel-safety checking code was updated to
> bail out early if no underlying SELECT is specified for the INSERT. No
> change to the test code was required.
> - Added comment to better explain the reason for treating "INSERT ...
> ON CONFLICT ... DO UPDATE" as parallel-unsafe (Dilip)
> - Added assertion to heap_prepare_insert() (Amit)
> - Updated error handling for NULL index_expr_item case (Vignesh)
Thanks Greg for fixing and posting a new patch.
Few comments:
+-- Test INSERT with underlying query.
+-- (should create plan with parallel SELECT, Gather parent node)
+--
+explain(costs off) insert into para_insert_p1 select unique1,
stringu1 from tenk1;
+ QUERY PLAN
+----------------------------------------
+ Insert on para_insert_p1
+ -> Gather
+ Workers Planned: 4
+ -> Parallel Seq Scan on tenk1
+(4 rows)
+
+insert into para_insert_p1 select unique1, stringu1 from tenk1;
+select count(*), sum(unique1) from para_insert_p1;
+ count | sum
+-------+----------
+ 10000 | 49995000
+(1 row)
+
For one of the test you can validate that the same transaction has
been used by all the parallel workers, you could use something like
below to validate:
SELECT COUNT(*) FROM (SELECT DISTINCT cmin,xmin FROM para_insert_p1) as dt;
Few includes are not required:
#include "executor/nodeGather.h"
+#include "executor/nodeModifyTable.h"
#include "executor/nodeSubplan.h"
#include "executor/tqueue.h"
#include "miscadmin.h"
@@ -60,6 +61,7 @@ ExecInitGather(Gather *node, EState *estate, int eflags)
GatherState *gatherstate;
Plan *outerNode;
TupleDesc tupDesc;
+ Index varno;
This include is not required in nodeModifyTable.c
+#include "catalog/index.h"
+#include "catalog/indexing.h"
@@ -43,7 +49,11 @@
#include "parser/parse_agg.h"
#include "parser/parse_coerce.h"
#include "parser/parse_func.h"
+#include "parser/parsetree.h"
+#include "partitioning/partdesc.h"
+#include "rewrite/rewriteHandler.h"
#include "rewrite/rewriteManip.h"
+#include "storage/lmgr.h"
#include "tcop/tcopprot.h"
The includes indexing.h, rewriteHandler.h & lmgr.h is not required in clauses.c
There are few typos:
+ table and populate it can use a parallel plan. Another
exeption is the command
+ <literal>INSERT INTO ... SELECT ...</literal> which can use a
parallel plan for
+ the underlying <literal>SELECT</literal> part of the query.
exeption should be exception
+ /*
+ * For the number of workers to use for a parallel
+ * INSERT/UPDATE/DELETE, it seems resonable to use the same number
+ * of workers as estimated for the underlying query.
+ */
+ parallelModifyWorkers = path->parallel_workers;
resonable should be reasonable
Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2021-01-08 09:25:15 | Re: Single transaction in the tablesync worker? |
Previous Message | Bharath Rupireddy | 2021-01-08 09:24:50 | Re: EXPLAIN/EXPLAIN ANALYZE REFRESH MATERIALIZED VIEW |