From: | Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> |
---|---|
To: | Prabhat Sahu <prabhat(dot)sahu(at)enterprisedb(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Oleg Golovanov <rentech(at)mail(dot)ru> |
Subject: | Re: Parallel Hash take II |
Date: | 2017-09-14 14:01:42 |
Message-ID: | CAEepm=3Q5krR05K76gtSsv=p+2HOyRBf4UX8SobPufKDCrPj4A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Sep 14, 2017 at 11:57 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Thu, Sep 14, 2017 at 12:51 AM, Prabhat Sahu
> <prabhat(dot)sahu(at)enterprisedb(dot)com> wrote:
>> Setting with lower "shared_buffers" and "work_mem" as below, query getting crash but able to see explain plan.
>
> Thanks Prabhat. A small thinko in the batch reset code means that it
> sometimes thinks the shared skew hash table is present and tries to
> probe it after batch 1. I have a fix for that and I will post a new
> patch set just as soon as I have a good regression test figured out.
Fixed in the attached version, by adding a missing
"hashtable->shared->num_skew_buckets = 0;" to ExecHashFreeSkewTable().
I did some incidental tidying of the regression tests, but didn't
manage to find a version of your example small enough to put in a
regression tests. I also discovered some other things:
1. Multi-batch Parallel Hash Join could occasionally produce a
resowner warning about a leaked temporary File associated with
SharedTupleStore objects. Fixed by making sure we call routines that
close all files handles in ExecHashTableDetach().
2. Since last time I tested, a lot fewer TPCH queries choose a
Parallel Hash plan. Not sure why yet. Possibly because Gather Merge
and other things got better. Will investigate.
3. Gather Merge and Parallel Hash Join may have a deadlock problem.
Since Gather Merge needs to block waiting for tuples, but workers wait
for all participants (including the leader) to reach barriers. TPCH
Q18 (with a certain set of indexes and settings, YMMV) has Gather
Merge over Sort over Parallel Hash Join, and although it usually runs
successfully I have observed one deadlock. Ouch. This seems to be a
more fundamental problem than the blocked TupleQueue scenario. Not
sure what to do about that.
--
Thomas Munro
http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
parallel-hash-v20.patchset.tgz | application/x-gzip | 76.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2017-09-14 14:12:05 | Re: Log LDAP "diagnostic messages"? |
Previous Message | Craig Ringer | 2017-09-14 14:00:54 | Re: [PATCH] pageinspect function to decode infomasks |