From: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Avoiding hash join batch explosions with extreme skew and weird stats |
Date: | 2019-07-03 21:22:09 |
Message-ID: | CAAKRu_ZkkukQgXCK8ADe-PmvcmpZh6G1Uin8pqqovL4x7P30mQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jun 18, 2019 at 3:24 PM Melanie Plageman <melanieplageman(at)gmail(dot)com>
wrote:
>
> These questions will probably make a lot more sense with corresponding
> code, so I will follow up with the second version of the state machine
> patch once I finish it.
>
>
I have changed the state machine and resolved the questions I had
raised in the previous email. This seems to work for the parallel and
non-parallel cases. I have not yet rewritten the unmatched outer tuple
status as a bitmap in a spill file (for ease of debugging).
Before doing that, I wanted to ask what a desirable fallback condition
would be. In this patch, fallback to hashloop join happens only when
inserting tuples into the hashtable after batch 0 when inserting
another tuple from the batch file would exceed work_mem. This means
you can't increase nbatches, which, I would think is undesirable.
I thought a bit about when fallback should happen. So, let's say that
we would like to fallback to hashloop join when we have increased
nbatches X times. At that point, since we do not want to fall back to
hashloop join for all batches, we have to make a decision. After
increasing nbatches the Xth time, do we then fall back for all batches
for which inserting inner tuples exceeds work_mem? Do we use this
strategy but work_mem + some fudge factor?
Or, do we instead try to determine if data skew led us to increase
nbatches both times and then determine which batch, given new
nbatches, contains that data, set fallback to true only for that
batch, and let all other batches use the existing logic (with no
fallback option) unless they contain a value which leads to increasing
nbatches X number of times?
--
Melanie Plageman
Attachment | Content-Type | Size |
---|---|---|
v2-0001-hashloop-fallback.patch | text/x-patch | 30.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | David Rowley | 2019-07-03 22:06:11 | Re: Index Skip Scan |
Previous Message | Alvaro Herrera | 2019-07-03 21:18:48 | Re: [PATCH][PROPOSAL] Add enum releation option type |