Re: Parallel Inserts in CREATE TABLE AS

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Zhihong Yu <zyu(at)yugabyte(dot)com>, Luc Vlaming <luc(at)swarm64(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
Subject: Re: Parallel Inserts in CREATE TABLE AS
Date: 2021-05-31 06:48:35
Message-ID: CALj2ACVydhFUX1vNAb5zgwxepz1DVfop-L+Jm5+_hDjSBNhEhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 29, 2021 at 9:46 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> While looking at results, I have observed one more thing that we are
> trying to parallelize I/O due to which we might not be seeing benefit
> in such cases. I think even for non-write queries there won't be any
> (much) benefit if we can't parallelize CPU usage. Basically, the test
> you are doing is for statement: explain analyze verbose create table
> test as select * from tenk1;. Now, in this statement, there is no
> qualification and still, the Gather node is generated for it, this
> won't be the case if we check "select * from tenk1". Is it due to the
> reason that the patch completely ignores the parallel_tuple_cost? But
> still, it should prefer a serial plan due parallel_setup_cost, why is
> that not happening? Anyway, I think we should not parallelize such
> queries where we can't parallelize CPU usage. Have you tried the cases
> without changing any of the costings for parallelism?

Hi,

I measured the execution timings for parallel inserts in CTAS in cases
where the planner chooses parallelism for selects naturally. This
means, I have used only 0001 patch from v23 patch set at [1]. I have
not used the 0002 patch that makes parallel_tuple_cost 0.

Query used for all these tests is below. Also, attached table creation
sqls in the file "test_cases".
EXPLAIN (ANALYZE, VERBOSE) create table test1 as select * from tenk1
t1, tenk2 t2 where t1.c1 = t2.d2;

All the results are of the form (number of workers, exec time in milli sec).

Test case 1: both tenk1 and tenk2 are of tables with 1 integer(of 4
bytes) columns, tuple size 28 bytes, 100mn tuples
master: (0, 277886.951 ms), (2, 171183.221 ms), (4, 159703.496 ms)
with parallel inserts CTAS patch: (0, 264709.186 ms), (2, 128354.448
ms), (4, 111533.731 ms)

Test case 2: both tenk1 and tenk2 are of tables with 2 integer(of 4
bytes each) columns, 3 varchar(8), tuple size 59 bytes, 100mn tuples
master: (0, 453505.228 ms), (2, 236762.759 ms), (4, 219038.126 ms)
with parallel inserts CTAS patch: (0, 470483.818 ms), (2, 219374.198
ms), (4, 203543.681 ms)

Test case 3: both tenk1 and tenk2 are of tables with 2 bigint(of 8
bytes each) columns, 3 name(of 64 bytes each) columns, 1 varchar(8),
tuple size 241 bytes, 100mn tuples
master: (0, 1052725.928 ms), (2, 592968.486 ms), (4, 562137.891 ms)
with parallel inserts CTAS patch: (0, 1019086.805 ms), (2, 634448.322
ms), (4, 680793.305 ms)

Test case 4: both tenk1 and tenk2 are of tables with 2 bigint(of 8
bytes each) columns, 16 name(of 64 bytes each) columns, tuple size
1064 bytes, 10mn tuples
master: (0, 371931.497 ms), (2, 247206.841 ms), (4, 241959.839 ms)
with parallel inserts CTAS patch: (0, 396342.329 ms), (2, 333860.472
ms), (4, 317895.558 ms)

Observation: parallel insert + parallel select gives good benefit wIth
very lesser tuple sizes, cases 1 and 2. If the tuple size is bigger
serial insert + parallel select fares better, cases 3 and 4.

In the coming days, I will try to work on more performance analysis
and clarify some of the points raised upthread.

[1] - https://www.postgresql.org/message-id/CALj2ACXVWr1o%2BFZrkQt-2GvYfuMQeJjWohajmp62Wr6BU8Y4VA%40mail.gmail.com
[2] - postgresql.conf changes I made:
shared_buffers = 40GB
max_worker_processes = 32
max_parallel_maintenance_workers = 24
max_parallel_workers = 32
synchronous_commit = on
checkpoint_timeout = 1d
max_wal_size = 24GB
min_wal_size = 15GB
autovacuum = off
wal_level = replica

With Regards,
Bharath Rupireddy.

Attachment Content-Type Size
test_cases application/octet-stream 4.2 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-05-31 06:50:56 Re: [BUG]Update Toast data failure in logical replication
Previous Message Michael Paquier 2021-05-31 06:31:08 Re: Multiple hosts in connection string failed to failover in non-hot standby mode