RE: Parallel INSERT (INTO ... SELECT ...)

From: "Hou, Zhijie" <houzj(dot)fnst(at)cn(dot)fujitsu(dot)com>
To: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)fujitsu(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Tang, Haiying" <tanghy(dot)fnst(at)cn(dot)fujitsu(dot)com>, Antonin Houska <ah(at)cybertec(dot)at>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Amit Langote <amitlangote09(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>
Subject: RE: Parallel INSERT (INTO ... SELECT ...)
Date: 2021-02-10 05:17:15
Message-ID: f47a6023dde14a4bb09a01bc17dfc41c@G08CNEXMBPEKD05.g08.fujitsu.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > ----------------------
> > postgres=# select pg_size_pretty(pg_indexes_size('testscan_index'));
> > pg_size_pretty
> > ----------------
> > 4048 kB
> > (1 row)
> >
> > postgres=# select pg_size_pretty(pg_relation_size('testscan_index'));
> > pg_size_pretty
> > ----------------
> > 4768 kB
> > (1 row)
>
> Which of the above shows the table size? What does pg_indexes_size()
> against an index (testscan_index) return?

Sorry, Maybe the tablename is a little confused,
but 'testscan_index' is actually a table's name.

pg_indexes_size will return the index's size attatched to the table.
pg_relation_size will return the table's size.

Did I miss something ?

> > IMO, due to the difference of inserts with parallel execution, the
> > btree insert's cost is more than serial.
> >
> > At the same time, the parallel does not have a huge performance gain
> > with bitmapscan, So the extra cost of btree index will result in
> > performance degradation.
>
> How did you know that the parallelism didn't have a huge performance gain
> with bitmap scan?
>
> [serial]
> -> Bitmap Heap Scan on public.x (cost=3272.20..3652841.26 rows=79918
> width=8) (actual time=8.096..41.005 rows=129999 loops=1)
>
> [parallel]
> -> Parallel Bitmap Heap Scan on public.x
> (cost=3272.20..1260119.35 rows=19980 width=8) (actual time=5.832..14.787
> rows=26000 loops=5)

I tested the case without insert(Just the query use bitmapscan):

[serial]:
postgres=# explain analyze select a from x where a<80000 or (a%2=0 and a>199900000);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on x (cost=3258.59..3647578.53 rows=81338 width=4) (actual time=8.091..34.222 rows=129999 loops=1)
Recheck Cond: ((a < 80000) OR (a > 199900000))
Filter: ((a < 80000) OR (((a % 2) = 0) AND (a > 199900000)))
Rows Removed by Filter: 50000
Heap Blocks: exact=975
-> BitmapOr (cost=3258.59..3258.59 rows=173971 width=0) (actual time=7.964..7.965 rows=0 loops=1)
-> Bitmap Index Scan on x_a_idx (cost=0.00..1495.11 rows=80872 width=0) (actual time=3.451..3.451 rows=79999 loops=1)
Index Cond: (a < 80000)
-> Bitmap Index Scan on x_a_idx (cost=0.00..1722.81 rows=93099 width=0) (actual time=4.513..4.513 rows=100000 loops=1)
Index Cond: (a > 199900000)
Planning Time: 0.108 ms
Execution Time: 38.136 ms

[parallel]
postgres=# explain analyze select a from x where a<80000 or (a%2=0 and a>199900000);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=4258.59..1266704.42 rows=81338 width=4) (actual time=9.177..22.592 rows=129999 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Parallel Bitmap Heap Scan on x (cost=3258.59..1257570.62 rows=20334 width=4) (actual time=6.402..12.882 rows=26000 loops=5)
Recheck Cond: ((a < 80000) OR (a > 199900000))
Filter: ((a < 80000) OR (((a % 2) = 0) AND (a > 199900000)))
Rows Removed by Filter: 10000
Heap Blocks: exact=1
-> BitmapOr (cost=3258.59..3258.59 rows=173971 width=0) (actual time=8.785..8.786 rows=0 loops=1)
-> Bitmap Index Scan on x_a_idx (cost=0.00..1495.11 rows=80872 width=0) (actual time=3.871..3.871 rows=79999 loops=1)
Index Cond: (a < 80000)
-> Bitmap Index Scan on x_a_idx (cost=0.00..1722.81 rows=93099 width=0) (actual time=4.914..4.914 rows=100000 loops=1)
Index Cond: (a > 199900000)
Planning Time: 0.158 ms
Execution Time: 26.951 ms
(15 rows)

It did have performance gain, but I think it's not huge enough to ignore the extra's index cost.
What do you think ?

Best regards,
houzj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2021-02-10 05:32:24 Re: Single transaction in the tablesync worker?
Previous Message tsunakawa.takay@fujitsu.com 2021-02-10 05:00:35 RE: Parallel INSERT (INTO ... SELECT ...)