Quick Links

Re: bad estimation together with large work_mem generates terrible slow hash joins

From:	Tomas Vondra <tv(at)fuzzy(dot)cz>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: bad estimation together with large work_mem generates terrible slow hash joins
Date:	2014-07-13 20:43:06
Message-ID:	53C2EF5A.4030405@fuzzy.cz
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 3.7.2014 19:36, Tomas Vondra wrote:
> On 1.7.2014 01:24, Tomas Vondra wrote:
>> On 30.6.2014 23:12, Tomas Vondra wrote:
>>> Hi,
>>
>> Hopefully I got it right this time. At least it seems to be working
>> for cases that failed before (no file leaks, proper rowcounts so
>> far).
>
> Attached v7, fixing nbatch/ntuples in an assert.
>
> regards
> Tomas

Attached is v8 of this patch, significantly reworked based on the
related discussion (mostly in 'tweaking NTUP_PER_BUCKET').

With the patch applied, the hash table works like this:

initial sizing (ExecChooseHashTableSize)
----------------------------------------
- size nbuckets for NTUP_PER_BUCKET=1
- account for memory allocated for buckets

building the hash table
-----------------------
- while adding tuples, keep track of optimal number of buckets (for the
current number of tuples per batch)
- don't resize until all the tuples are added (by increasing nbatch the
number of buckets may decrease)
- after adding all tuples, consider resizing (optimal nbuckets !=
initial nbuckets)
- do the resize

This means that for well estimated plans (reasonably exact tuple count
for the inner relation), the hash table is properly sized and no resize
is needed.

For badly estimated plans, this works about the same as the previous
patches (we're accounting for memory needed for buckets etc.).

More batches or less buckets?
-----------------------------
In the related threads, I repeatedly mentioned that I don't know a good
answer to "Should I add more batches or decrease the number of buckets?"
while sizing the hash table. The current code (as in HEAD) does not face
this dillema because (a) it does not count buckets into work_mem and (b)
does not allow changing nbuckets.

I still don't have a good answer to this question, but I think the patch
can stand even without it. If you want more/less batches, use
smaller/larger work_mem - same as with the current code. Also, with the
'dense allocation' patch [1], it's possible to use larger work_mem
values (and thus compensate for counting buckets into work_mem).

Changes since v7
----------------

(a) remove NTUP_GROW_THRESHOLD (just use NTUP_PER_BUCKET directly)
(b) set NTUP_PER_BUCKET=1
(c) include buckets (optimal) when considering nbatch increase
(d) track optimal number of buckets (for NTUP_PER_BUCKET)
(e) after loading all the tuples, resize the hash table to get
nbuckets_optimal (if needed)
(f) renamed GUC to enable_hash_resize (makes a bit more sense)

Comments
--------

(a) NTUP_GROW_THRESHOLD was overcomplicated and mostly a result of
misunderstanding how NTUP_PER_BUCKET works (it's upper threshold,
not average) and is not really needed.

(b) The value "1" gives the best performance, but also requires more
memory for the buckets. This should be more than compensated by
densely allocating the tuples (see hashjoin-alloc-v3.patch
in the "tweaking NTUP_PER_BUCKET" thread [1]).

(c,d) Originally, the memory needed for buckets was not included in
'spaceUsed', but because the NTUP_PER_BUCKET=1 change this is
not really acceptable (needs much more memory than before).
So now it's included both in the initial sizing of the hash
table, and when adding more tuples (nbuckets_optimal).

Still, spaceUsed does not include palloc overhead (which may be
a significant amount of memory). This is tackled by [1].

(e) No change here.

(f) A bit better GUC name. Anyway, this is for easier development
only, and won't be included in the final patch.

regards
Tomas

[1] http://www.postgresql.org/message-id/53C2DECC.2010408@fuzzy.cz

Attachment	Content-Type	Size
hashjoin-nbuckets-growth-v8.patch	text/x-diff	16.5 KB

In response to

Re: bad estimation together with large work_mem generates terrible slow hash joins at 2014-07-03 17:36:24 from Tomas Vondra

Responses

Re: bad estimation together with large work_mem generates terrible slow hash joins at 2014-07-20 16:29:34 from Tomas Vondra

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Craig Ringer	2014-07-14 03:47:10	Re: [JDBC] Setting PG-version without recompiling
Previous Message	Magnus Hagander	2014-07-13 20:35:13	Re: SSL information view