Re: Problems with hash join over nested loop

From: Jim Nasby <jnasby(at)enova(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Problems with hash join over nested loop
Date: 2013-10-29 16:21:26
Message-ID: 526FE086.4020903@enova.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 10/29/13 9:10 AM, Merlin Moncure wrote:
> On Mon, Oct 28, 2013 at 6:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Jim Nasby <jnasby(at)enova(dot)com> writes:
>>> I've been working on trying to normalize a table that's got a bunch of text fields. Normalizing the first 4 has been a non-issue. But when I try and normalize 2 additional fields a bunch of query plans go belly-up.
>>
>> Try increasing join_collapse_limit/from_collapse_limit. I'm a bit
>> confused by your description but I think maybe you've got more than 8
>> relations in the subqueries.
>
> Hm -- wondering out loud if there would be any value in terms of
> decorating explain output when that limit was hit and if it's
> practical to do so...

I think the community would *love* any method of noting potential performance problems. Hitting the GEQO limit fits in there as well. We could eventually warn about other things as well, like going just over work_mem or seqscanning a big table for a small number of rows.

I'm also wondering if it's time to raise those limits. I constructed a somewhat contrived test query in our schema to test this. This is a legitimate join path for our schema... I can't see why someone would use the *full* path, but smaller sections are definitely in use. It's basically all joins, with one simple filter on top of that.

I'd rather not share the actual query or plan, but:

grep -i scan temp.txt |wc -l
28

All tests done via EXPLAIN ... in psql with \timing turned on. I ignored obvious outliers... margin of error is ~5% from what I saw:

Default config: 21ms
geqo = off: 19ms
geqo off, from_collapse = 99: 19ms
from_collapse_limit = 99: 21ms
join_collapse_limit = 99: 171ms
both = 99: 176ms
geqo off, join_collapse = 99 1.2s
both + geqo = off: 1.2s

Obviously there's cases where 1.2 seconds of planning time will kill you... but if you're that time sensitive and using 28 tables I think it's reasonable to expect people to do some hand tuning! :)

Conversely, where you are likely to get to that sheer number of tables is when you're doing something that's going to take a non-trivial amount of time to execute. In this particular case, if I limit the query to a single row (via blah_id = 2, not via limit), it takes ~2ms to execute when cached with full optimization (interestingly, planning time was at about 926ms at that point).

Now that looks horrible... 926ms to plan a query that takes 2ms to return. But I'm not even going to bother with the 20ms plan, because it's going to take minutes if not HOURS to run (it's just full scanning everything it can find).
--
Jim Nasby, Lead Data Architect (512) 569-9461

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2013-10-29 16:45:14 Re: Problems with hash join over nested loop
Previous Message Merlin Moncure 2013-10-29 14:10:43 Re: Problems with hash join over nested loop