Re: Parallel Inserts in CREATE TABLE AS

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Zhihong Yu <zyu(at)yugabyte(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2021-01-04 09:36:49
Message-ID: CALj2ACWsRC9O+bpyEEgAg6NGRU7e7-c2jPE8vgZ5iW9TKfEVDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 30, 2020 at 5:28 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> Few comments:
> - /*
> - * To allow parallel inserts, we need to ensure that they are safe to be
> - * performed in workers. We have the infrastructure to allow parallel
> - * inserts in general except for the cases where inserts generate a new
> - * CommandId (eg. inserts into a table having a foreign key column).
> - */
> - if (IsParallelWorker())
> - ereport(ERROR,
> - (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> - errmsg("cannot insert tuples in a
> parallel worker")));
>
> Is it possible to add a check if it is a CTAS insert here as we do not
> support insert in parallel workers from others as of now.

Currently, there's no global variable in which we can selectively skip
this in case of parallel insertion in CTAS. How about having a
variable in any of the worker global contexts, set that when parallel
insertion is chosen for CTAS and use that in heap_prepare_insert() to
skip the above error? Eventually, we can remove this restriction
entirely in case we fully allow parallelism for INSERT INTO SELECT,
CTAS, and COPY.

Thoughts?

> + Oid objectid; /* workers to
> open relation/table. */
> + /* Number of tuples inserted by all the workers. */
> + pg_atomic_uint64 processed;
>
> We can just mention relation instead of relation/table.

I will modify it in the next patch set.

> +select explain_pictas(
> +'create table parallel_write as select length(stringu1) from tenk1;');
> + explain_pictas
> +----------------------------------------------------------
> + Gather (actual rows=N loops=N)
> + Workers Planned: 4
> + Workers Launched: N
> + -> Create parallel_write
> + -> Parallel Seq Scan on tenk1 (actual rows=N loops=N)
> +(5 rows)
> +
> +select count(*) from parallel_write;
>
> Can we include selection of cmin, xmin for one of the test to verify
> that it uses the same transaction id in the parallel workers
> something like:
> select distinct(cmin,xmin) from parallel_write;

This is not possible since cmin and xmin are dynamic, we can not use
them in test cases. I think it's not necessary to check whether the
leader and workers are in the same txn or not, since we are not
creating a new txn. All the txn state from the leader is serialized in
SerializeTransactionState and restored in
StartParallelWorkerTransaction.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2021-01-04 10:00:21 Re: A failure of standby to follow timeline switch
Previous Message Bharath Rupireddy 2021-01-04 09:35:24 Re: Parallel Inserts in CREATE TABLE AS