Re: Explanation for bug #13908: hash joins are badly broken

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Explanation for bug #13908: hash joins are badly broken
Date: 2016-02-07 16:20:58
Message-ID: 56B76EEA.80404@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/06/2016 10:22 PM, Tom Lane wrote:
> Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>> What about using the dense allocation even for the skew buckets,
>> but not one context for all skew buckets but one per bucket? Then
>> when we delete a bucket, we simply destroy the context (and free
>> the chunks, just like we do with the current dense allocator).
>
> Yeah, I was wondering about that too, but it only works if you have
> quite a few tuples per skew bucket, else you waste a lot of space.
> And you were right upthread that what we're collecting is keys
> expected to be common in the outer table, not the inner table. So
> it's entirely likely that the typical case is just one inner tuple
> per skew bucket. (Should check that out though ...)

I'd argue this is true for vast majority of joins, because the joins
tend to be on foreign keys with the "dimension" as the inner table, thus
having exactly one row per skew bucket. The upper bound for number of
skew buckets is the statistics target (i.e. max number of MCV items). So
either 100 (default) or possibly up to 10000 (max).

For tuples wider than 8kB, we have no problem at all because those
allocations will be treated as separate chunks and will be freed()
immediately, making the memory reusable for the dense allocator. If the
tuples are narrower than 8kB, we get a rather limited amount of memory
in the skew hash (800kB / 80MB in the extreme cases with the max number
of MCV items).

So perhaps in this case we don't really need to worry about the
accounting and memory usage too much.

That of course does not mean we should not try to do better in cases
when the number of tuples per skew bucket really is high. No doubt we
can construct such joins. If we could estimate the number of tuples per
skew bucket, that'd allow us to handle this case differently.

FWIW there was a patch from David some time ago, identifying "unique"
joins where the join key is unique in the inner relation. That might be
useful for this, I guess.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2016-02-07 16:34:17 Re: pgcommitfest reply having corrupted subject line
Previous Message Jinhua Luo 2016-02-07 16:00:49 Does plpython support threading?