From: | David Kimura <david(dot)g(dot)kimura(at)gmail(dot)com> |
---|---|
To: | Melanie Plageman <melanieplageman(at)gmail(dot)com> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jesse Zhang <sbjesse(at)gmail(dot)com>, dkimura(at)pivotal(dot)io |
Subject: | Re: Avoiding hash join batch explosions with extreme skew and weird stats |
Date: | 2020-04-29 23:44:53 |
Message-ID: | CAHnPFjSV8u=D85RnorugR-5-RR73msDghuQ1sRRnwbVa6S-Oyg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Apr 29, 2020 at 4:39 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> In addition to many assorted TODOs in the code, there are a few major
> projects left:
> - Batch 0 falling back
> - Stripe barrier deadlock
> - Performance improvements and testing
>
Batch 0 never spills. That behavior is an artifact of the existing design that
as an optimization special cases batch 0 to fill the initial hash table. This
means it can skip loading and doesn't need to create a batch file.
However in the pathalogical case where all tuples hash to batch 0 there is no
way to redistribute those tuples to other batches. So, existing hash join
implementation allows work_mem to be exceeded for batch 0.
In adaptive hash join approach, there is another way to deal with a batch that
exceeds work_mem. If increasing the number of batches does not work then the
batch can be split into stripes that will not exceed work_mem. Doing this
requires spilling the excess tuples to batch files. Following patch adds logic
to create a batch 0 file for serial hash join so that even in pathalogical case
we do not need to exceed work_mem.
Thanks,
David
Attachment | Content-Type | Size |
---|---|---|
v6-0002-Implement-fallback-of-batch-0-for-serial-adaptive.patch | application/octet-stream | 4.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Jonathan S. Katz | 2020-04-29 23:55:16 | Re: Poll: are people okay with function/operator table redesign? |
Previous Message | David Zhang | 2020-04-29 23:42:50 | Can the OUT parameter be enabled in stored procedure? |