From: | Rushabh Lathia <rushabh(dot)lathia(at)gmail(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> |
Subject: | Re: Parallel tuplesort (for parallel B-Tree index creation) |
Date: | 2017-09-19 10:21:29 |
Message-ID: | CAGPqQf3a79OOVX=N3eqY37aqVVw3kQdKs+RQxv8D1ga8=XM2mg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 22, 2017 at 3:19 AM, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com
> wrote:
> On Wed, Mar 22, 2017 at 10:03 AM, Robert Haas <robertmhaas(at)gmail(dot)com>
> wrote:
> > On Tue, Mar 21, 2017 at 3:50 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> >> I disagree with that. It is a
> >> trade-off, I suppose. I have now run out of time to work through it
> >> with you or Thomas, though.
> >
> > Bummer.
>
> I'm going to experiment with refactoring the v10 parallel CREATE INDEX
> patch to use the SharedBufFileSet interface from
> hj-shared-buf-file-v8.patch today and see what problems I run into.
>
>
As per the earlier discussion in the thread, I did experiment using
BufFileSet interface from parallel-hash-v18.patchset. I took the reference
of parallel-hash other patches to understand the BufFileSet APIs, and
incorporate the changes to parallel create index.
In order to achieve the same:
- Applied 0007-Remove-BufFile-s-isTemp-flag.patch and
0008-Add-BufFileSet-for-sharing-temporary-files-between-b.patch from the
parallel-hash-v18.patchset.
- Removed the buffile.c/logtap.c/fd.c changes from the parallel CREATE
INDEX v10 patch.
- incorporate the BufFileSet API to the parallel tuple sort for CREATE
INDEX.
- Changes into few existing functions as well as added few to support the
BufFileSet changes.
To check the performance, I used the similar test which Peter posted in
his earlier thread. which is:
Machine: power2 machine with 512GB of RAM
Setup:
CREATE TABLE parallel_sort_test AS
SELECT hashint8(i) randint,
md5(i::text) collate "C" padding1,
md5(i::text || '2') collate "C" padding2
FROM generate_series(0, 1e9::bigint) i;
vacuum ANALYZE parallel_sort_test;
postgres=# show max_parallel_workers_per_gather;
max_parallel_workers_per_gather
---------------------------------
8
(1 row)
postgres=# show maintenance_work_mem;
maintenance_work_mem
----------------------
8GB
(1 row)
postgres=# show max_wal_size ;
max_wal_size
--------------
4GB
(1 row)
CREATE INDEX serial_idx ON parallel_sort_test (randint);
*Without patch:*
Time: 3430054.220 ms (57:10.054)
*With patch (max_parallel_workers_maintenance = 8):*
Time: 1163445.271 ms (19:23.445)
Thanks to my colleague Thomas Munro for his help and off-line discussion
for the patch.
Attaching v11 patch and trace_sort output for the test.
Thanks,
Rushabh Lathia
www.EnterpriseDB.com
Attachment | Content-Type | Size |
---|---|---|
0001-Add-parallel-B-tree-index-build-sorting_v11.patch | text/x-patch | 151.1 KB |
trace_sort.output | application/octet-stream | 35.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2017-09-19 10:41:50 | Re: path toward faster partition pruning |
Previous Message | Alvaro Herrera | 2017-09-19 10:20:05 | Re: Partition-wise join for join between (declaratively) partitioned tables |