Quick Links

Re: Parallel copy

From:	Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc:	Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alastair Turner <minion(at)decodable(dot)me>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: Parallel copy
Date:	2020-06-15 11:09:04
Message-ID:	CALj2ACVR4WE98Per1H7ajosW8vafN16548O2UV8bG3p4D3XnPg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

Attached is the patch supporting parallel copy for binary format files.

The performance improvement achieved with different workers is as shown
below. Dataset used has 10million tuples and is of 5.3GB size.

parallel workers test case 1(exec time in sec): copy from binary file, 2
indexes on integer columns and 1 index on text column test case 2(exec time
in sec): copy from binary file, 1 gist index on text column test case
3(exec time in sec): copy from binary file, 3 indexes on integer columns
0 1106.899(1X) 772.758(1X) 171.338(1X)
1 1094.165(1.01X) 757.365(1.02X) 163.018(1.05X)
2 618.397(1.79X) 428.304(1.8X) 117.508(1.46X)
4 320.511(3.45X) 231.938(3.33X) 80.297(2.13X)
8 172.462(6.42X) 150.212(5.14X) *71.518(2.39X)*
16 110.460(10.02X) *124.929(6.18X)* 91.308(1.88X)
20 *98.470(11.24X)* 137.313(5.63X) 95.289(1.79X)
30 109.229(10.13X) 173.54(4.45X) 95.799(1.78X)

Design followed for developing this patch:

Leader reads data from the file into the DSM data blocks each of 64K size.
It also identifies each tuple data block id, start offset, end offset,
tuple size and updates this information in the ring data structure. Workers
parallely read the tuple information from the ring data structure, the
actual tuple data from the data blocks and parallely insert the tuples into
the table.

Please note that this patch can be applied on the series of patches that
were posted previously[1] for parallel copy for csv/text files.
The correct order to apply all the patches is -
0001-Copy-code-readjustment-to-support-parallel-copy.patch
<https://www.postgresql.org/message-id/attachment/111463/0001-Copy-code-readjustment-to-support-parallel-copy.patch>
0002-Framework-for-leader-worker-in-parallel-copy.patch
<https://www.postgresql.org/message-id/attachment/111465/0002-Framework-for-leader-worker-in-parallel-copy.patch>
0003-Allow-copy-from-command-to-process-data-from-file-ST.patch
<https://www.postgresql.org/message-id/attachment/111464/0003-Allow-copy-from-command-to-process-data-from-file-ST.patch>
0004-Documentation-for-parallel-copy.patch
<https://www.postgresql.org/message-id/attachment/111466/0004-Documentation-for-parallel-copy.patch>
and
0005-Parallel-Copy-For-Binary-Format-Files.patch

The above tests were run with the configuration attached config.txt, which
is the same used for performance tests of csv/text files posted earlier in
this mail chain.

Request the community to take this patch up for review along with the
parallel copy for csv/text file patches and provide feedback.

[1] -
https://www.postgresql.org/message-id/CALDaNm3uyHpD9sKoFtB0EnMO8DLuD6H9pReFm%3Dtm%3D9ccEWuUVQ%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment	Content-Type	Size
config.txt	text/plain	1.2 KB
0005-Parallel-Copy-For-Binary-Format-Files.patch	application/octet-stream	25.3 KB

In response to

Re: Parallel copy at 2020-06-13 04:12:20 from Amit Kapila

Responses

Re: Parallel copy at 2020-06-18 13:11:57 from vignesh C

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ashutosh Bapat	2020-06-15 11:15:25	Re: factorial of negative numbers
Previous Message	Amit Kapila	2020-06-15 11:02:22	Re: Transactions involving multiple postgres foreign servers, take 2