From: | Greg Nancarrow <gregn4422(at)gmail(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
Subject: | Re: Parallel copy |
Date: | 2020-09-03 06:50:31 |
Message-ID: | CAJcOf-fea8MrCV+2M=w8jMjp3L7UwJWtEyFsRXV5-gMHxr89CQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>On Wed, Sep 2, 2020 at 3:40 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> I have attached the scripts that I used for the test results I
> mentioned in my previous mail. create.sql file has the table that I
> used, insert_data_gen.txt has the insert data generation scripts. I
> varied the count in insert_data_gen to generate csv files of 1GB, 2GB
> & 5GB & varied the data to generate 1 char, 10 char & 100 char for
> each column for various testing. You can rename insert_data_gen.txt to
> insert_data_gen.sh & generate the csv file.
Hi Vignesh,
I used your script and table definition, multiplying the number of
records to produce a 5GB and 9.5GB CSV file.
I got the following results:
(1) Postgres default settings, 5GB CSV (530000 rows):
Copy Type Duration (s) Load factor
===============================================
Normal Copy 132.197 -
Parallel Copy
(#workers)
1 98.428 1.34
2 52.753 2.51
3 37.630 3.51
4 33.554 3.94
5 33.636 3.93
6 33.821 3.91
7 34.270 3.86
8 34.465 3.84
9 34.315 3.85
10 33.543 3.94
(2) Postgres increased resources, 5GB CSV (530000 rows):
shared_buffers = 20% of RAM (total RAM = 376GB) = 76GB
work_mem = 10% of RAM = 38GB
maintenance_work_mem = 10% of RAM = 38GB
max_worker_processes = 16
max_parallel_workers = 16
checkpoint_timeout = 30min
max_wal_size=2GB
Copy Type Duration (s) Load factor
===============================================
Normal Copy 131.835 -
Parallel Copy
(#workers)
1 98.301 1.34
2 53.261 2.48
3 37.868 3.48
4 34.224 3.85
5 33.831 3.90
6 34.229 3.85
7 34.512 3.82
8 34.303 3.84
9 34.690 3.80
10 34.479 3.82
(3) Postgres default settings, 9.5GB CSV (1000000 rows):
Copy Type Duration (s) Load factor
===============================================
Normal Copy 248.503 -
Parallel Copy
(#workers)
1 185.724 1.34
2 99.832 2.49
3 70.560 3.52
4 63.328 3.92
5 63.182 3.93
6 64.108 3.88
7 64.131 3.87
8 64.350 3.86
9 64.293 3.87
10 63.818 3.89
(4) Postgres increased resources, 9.5GB CSV (1000000 rows):
shared_buffers = 20% of RAM (total RAM = 376GB) = 76GB
work_mem = 10% of RAM = 38GB
maintenance_work_mem = 10% of RAM = 38GB
max_worker_processes = 16
max_parallel_workers = 16
checkpoint_timeout = 30min
max_wal_size=2GB
Copy Type Duration (s) Load factor
===============================================
Normal Copy 248.647 -
Parallel Copy
(#workers)
1 182.236 1.36
2 92.814 2.68
3 67.347 3.69
4 63.839 3.89
5 62.672 3.97
6 63.873 3.89
7 64.930 3.83
8 63.885 3.89
9 62.397 3.98
10 64.477 3.86
So as you found, with this particular table definition and data, 1
parallel worker always performs better than normal copy.
The different result obtained for this particular case seems to be
caused by the following factors:
- different table definition (I used a variety of column types)
- amount of data per row (I used less data per row, so more rows per
same size data file)
As I previously observed, if the target table has no indexes,
increasing resources beyond the default settings makes little
difference to the performance.
Regards,
Greg Nancarrow
Fujitsu Australia
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2020-09-03 07:05:05 | Re: New statistics for tuning WAL buffer size |
Previous Message | torikoshia | 2020-09-03 06:40:00 | Re: Get memory contexts of an arbitrary backend process |