From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: WIP: [[Parallel] Shared] Hash |
Date: | 2017-03-18 01:30:23 |
Message-ID: | CAEepm=2fE0UBOXzaBvvW4HsQZDQG4MpHBFai_T0iou0oA_VBPw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Mar 14, 2017 at 8:03 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Mon, Mar 13, 2017 at 8:40 PM, Rafia Sabih
> <rafia(dot)sabih(at)enterprisedb(dot)com> wrote:
>> In an attempt to test v7 of this patch on TPC-H 20 scale factor I found a
>> few regressions,
>> Q21: 52 secs on HEAD and 400 secs with this patch
>
> Thanks Rafia. Robert just pointed out off-list that there is a bogus
> 0 row estimate in here:
>
> -> Parallel Hash Semi Join (cost=1006599.34..1719227.30 rows=0
> width=24) (actual time=38716.488..100933.250 rows=7315896 loops=5)
>
> Will investigate, thanks.
There are two problems here.
1. There is a pre-existing cardinality estimate problem for
semi-joins with <> filters. The big Q21 regression reported by Rafia
is caused by that phenomenon, probably exacerbated by another bug that
allowed 0 cardinality estimates to percolate inside the planner.
Estimates have been clamped at or above 1.0 since her report by commit
1ea60ad6.
I started a new thread to discuss that because it's unrelated to this
patch, except insofar as it confuses the planner about Q21 (with or
without parallelism). Using one possible selectivity tweak suggested
by Tom Lane, I was able to measure significant speedups on otherwise
unpatched master:
2. If you compare master tweaked as above against the latest version
of my patch series with the tweak, then the patched version always
runs faster with 4 or more workers, but with only 1 or 2 workers Q21
is a bit slower... but not always. I realised that there was a
bi-modal distribution of execution times. It looks like my 'early
exit' protocol, designed to make tuple-queue deadlock impossible, is
often causing us to lose a worker. I am working on that now.
I have code changes for Peter G's and Andres's feedback queued up and
will send a v8 series shortly, hopefully with a fix for problem 2
above.
--
Thomas Munro
http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Erik Rijkers | 2017-03-18 05:37:36 | more on comments of snapbuild.c |
Previous Message | Robert Haas | 2017-03-18 00:10:02 | Re: Partition-wise join for join between (declaratively) partitioned tables |