From: | Luc Vlaming <luc(at)swarm64(dot)com> |
---|---|
To: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Cc: | Zhihong Yu <zyu(at)yugabyte(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Parallel Inserts in CREATE TABLE AS |
Date: | 2021-01-05 07:13:07 |
Message-ID: | ef02fe4e-b476-40e6-3fd6-3407caf82fb5@swarm64.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 04-01-2021 14:32, Bharath Rupireddy wrote:
> On Mon, Jan 4, 2021 at 4:22 PM Luc Vlaming <luc(at)swarm64(dot)com
> <mailto:luc(at)swarm64(dot)com>> wrote:
> > Sorry it took so long to get back to reviewing this.
>
> Thanks for the comments.
>
> > wrt v18-0001....patch:
> >
> > + /*
> > + * If the worker is for parallel insert in CTAS, then
> use the proper
> > + * dest receiver.
> > + */
> > + intoclause = (IntoClause *) stringToNode(intoclausestr);
> > + receiver = CreateIntoRelDestReceiver(intoclause);
> > + ((DR_intorel *)receiver)->is_parallel_worker = true;
> > + ((DR_intorel *)receiver)->object_id = fpes->objectid;
> > I would move this into a function called e.g.
> > GetCTASParallelWorkerReceiver so that the details wrt CTAS can be put in
> > createas.c.
> > I would then also split up intorel_startup into intorel_leader_startup
> > and intorel_worker_startup, and in GetCTASParallelWorkerReceiver set
> > self->pub.rStartup to intorel_worker_startup.
>
> My intention was to not add any new APIs to the dest receiver. I simply
> made the changes in intorel_startup, in which for workers it just does
> the minimalistic work and exit from it. In the leader most of the table
> creation and sanity check is kept untouched. Please have a look at the
> v19 patch posted upthread [1].
>
Looks much better, really nicely abstracted away in the v20 patch.
> > + volatile pg_atomic_uint64 *processed;
> > why is it volatile?
>
> Intention is to always read from the actual memory location. I referred
> it from the way pg_atomic_fetch_add_u64_impl,
> pg_atomic_compare_exchange_u64_impl, pg_atomic_init_u64_impl and their
> u32 counterparts use pass the parameter as volatile pg_atomic_uint64 *ptr.
>
Okay I had not seen this syntax before for atomics with the volatile
keyword but its apparently how the atomics abstraction works in postgresql.
> > + if (isctas)
> > + {
> > + intoclause = ((DR_intorel *)
> node->dest)->into;
> > + objectid = ((DR_intorel *)
> node->dest)->object_id;
> > + }
> > Given that you extract them each once and then pass them directly into
> > the parallel-worker, can't you instead pass in the destreceiver and
> > leave that logic to ExecInitParallelPlan?
>
> That's changed entirely in the v19 patch set posted upthread [1]. Please
> have a look. I didn't pass the dest receiver, to keep the API generic, I
> passed parallel insert command type and a void * ptr which points to
> insertion command because the information we pass to workers depends on
> the insertion command (for instance, the information needed by workers
> is for CTAS into clause and object id and for Refresh Mat View object id).
>
> >
> > + if
> (IS_PARALLEL_CTAS_DEST(gstate->dest) &&
> > + ((DR_intorel *)
> gstate->dest)->into->rel &&
> > + ((DR_intorel *)
> gstate->dest)->into->rel->relname)
> > why would rel and relname not be there? if no rows have been inserted?
> > because it seems from the intorel_startup function that that would be
> > set as soon as startup was done, which i assume (wrongly?) is always
> done?
>
> Actually, that into clause rel variable is always being set in the
> gram.y for CTAS, Create Materialized View and SELECT INTO (because
> qualified_name non-terminal is not optional). My bad. I just added it as
> a sanity check. Actually, it's not required.
>
> create_as_target:
> *qualified_name* opt_column_list table_access_method_clause
> OptWith OnCommitOption OptTableSpace
> {
> $$ = makeNode(IntoClause);
> * $$->rel = $1;*
> create_mv_target:
> *qualified_name* opt_column_list table_access_method_clause
> opt_reloptions OptTableSpace
> {
> $$ = makeNode(IntoClause);
> * $$->rel = $1;*
> into_clause:
> INTO OptTempTableName
> {
> $$ = makeNode(IntoClause);
> * $$->rel = $2;*
>
> I will change the below code:
> + if (GetParallelInsertCmdType(gstate->dest) ==
> + PARALLEL_INSERT_CMD_CREATE_TABLE_AS &&
> + ((DR_intorel *) gstate->dest)->into &&
> + ((DR_intorel *) gstate->dest)->into->rel &&
> + ((DR_intorel *) gstate->dest)->into->rel->relname)
> + {
>
> to:
> + if (GetParallelInsertCmdType(gstate->dest) ==
> + PARALLEL_INSERT_CMD_CREATE_TABLE_AS)
> + {
>
> I will update this in the next version of the patch set.
>
Thanks
> >
> > + * In case if no workers were launched, allow the leader to
> insert entire
> > + * tuples.
> > what does "entire tuples" mean? should it maybe be "all tuples"?
>
> Yeah, noticed that while working on the v19 patch set. Please have a
> look at the v19 patch posted upthread [1].
>
> > ================
> > wrt v18-0003....patch:
> >
> > not sure if it is needed, but i was wondering if we would want more
> > tests with multiple gather nodes existing? caused e.g. by using CTE's,
> > valid subquery's (like the one test you have, but without the group
> > by/having)?
>
> I'm not sure if we can have CTAS/CMV/SELECT INTO in CTEs like WITH, WITH
> RECURSIVE and I don't see that any of the WITH clause processing hits
> createas.c functions. So, IMHO, we don't need to add them. Please let me
> know if there are any specific use cases you have in mind.
>
> For instance, I tried to cover Init/Sub Plan and Subquery cases with:
>
> below case has multiple Gather, Init Plan:
> +-- parallel inserts must occur, as there is init plan that gets executed by
> +-- each parallel worker
> +select explain_pictas(
> +'create table parallel_write as select two col1,
> + (select two from (select * from tenk2) as tt limit 1) col2
> + from tenk1 where tenk1.four = 3;');
>
> below case has Gather, Sub Plan:
> +-- parallel inserts must not occur, as there is sub plan that gets
> executed by
> +-- the Gather node in leader
> +select explain_pictas(
> +'create table parallel_write as select two col1,
> + (select tenk1.two from generate_series(1,1)) col2
> + from tenk1 where tenk1.four = 3;');
>
> For multiple Gather node cases, I covered them with the Union All/Append
> cases in the 0004 patch. Please have a look.
>
Right, had not reviewed part 4 yet. My bad.
> [1] -
> https://www.postgresql.org/message-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk%2Bzmg%40mail.gmail.com
> <https://www.postgresql.org/message-id/CALj2ACWth7mVQtqdYJwSn1mNmaHwxNE7YSYxRSLmfkqxRk%2Bzmg%40mail.gmail.com>
>
> With Regards,
> Bharath Rupireddy.
> EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com>
Kind regards,
Luc
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2021-01-05 07:17:34 | Re: pg_rewind restore_command issue in PG12 |
Previous Message | Michael Paquier | 2021-01-05 07:02:07 | Re: Track replica origin progress for Rollback Prepared |