From: | "Tomas Vondra" <tv(at)fuzzy(dot)cz> |
---|---|
To: | "Robert Haas" <robertmhaas(at)gmail(dot)com> |
Cc: | "Tomas Vondra" <tv(at)fuzzy(dot)cz>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: bad estimation together with large work_mem generates terrible slow hash joins |
Date: | 2014-09-11 14:33:59 |
Message-ID: | 712b820f9627874e838fff667a409e63.squirrel@sq.gransy.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11 Září 2014, 15:31, Robert Haas wrote:
> On Wed, Sep 10, 2014 at 5:09 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>> OK. So here's v13 of the patch, reflecting this change.
>
> With the exception of ExecChooseHashTableSize() and a lot of stylistic
> issues along the lines of what I've already complained about, this
> patch seems pretty good to me. It does three things:
I believe I fixed the stylistic issues you pointed out, except maybe the
bracketing (which seems fine to me, though). So if you could point the
issues out, that'd be helpful.
>
> (1) It changes NTUP_PER_BUCKET to 1. Although this increases memory
> utilization, it also improves hash table performance, and the
> additional memory we use is less than what we saved as a result of
> commit 45f6240a8fa9d35548eb2ef23dba2c11540aa02a.
>
> (2) It changes ExecChooseHashTableSize() to include the effect of the
> memory consumed by the bucket array. If we care about properly
> respecting work_mem, this is clearly right for any NTUP_PER_BUCKET
> setting, but more important for a small setting like 1 than for the
> current value of 10. I'm not sure the logic in this function is as
> clean and simple as it can be just yet.
I made that as clear as I was able to (based on your feedback), but I'm
"stuck" as I'm familiar with the logic. That is not to say it can't be
improved, but this needs to be reviewed by someone else.
>
> (3) It allows the number of batches to increase on the fly while the
> hash join is in process. This case arises when we initially estimate
> that we only need a small hash table, and then it turns out that there
> are more tuples than we expect. Without this code, the hash table's
> load factor gets too high and things start to suck.
>
> I recommend separating this patch in two patches, the first covering
> items (1) and (2) and the second covering item (3), which really seems
> like a separate improvement.
That probably makes sense.
regards
Tomas
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2014-09-11 14:42:50 | Re: proposal (9.5) : psql unicode border line styles |
Previous Message | Andres Freund | 2014-09-11 14:27:17 | Re: Memory Alignment in Postgres |