Re: Problems with hash join over nested loop

From: Jim Nasby <jnasby(at)enova(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Problems with hash join over nested loop
Date: 2013-10-29 20:08:28
Message-ID: 527015BC.3020405@enova.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 10/29/13 1:20 PM, Tom Lane wrote:
> Jim Nasby <jnasby(at)enova(dot)com> writes:
>> On 10/29/13 11:45 AM, Tom Lane wrote:
>>> Jim Nasby <jnasby(at)enova(dot)com> writes:
>>>> I'm also wondering if it's time to raise those limits.
>
>>> Yeah, possibly. The current default values were set on machines much
>>> smaller/slower than most current hardware.
>>>
>>> I think also that the collapse limits were invented mainly to keep people
>>> out of GEQO's clutches, but we've made some significant fixes in GEQO
>>> since then. Maybe the real answer is to make the default collapse limits
>>> much higher, and lower geqo_threshold to whatever we think the threshold
>>> of pain is for applying the regular planner.
>
>> In my test case geqo does seem to do a good job. I'll see if I can get some data on how number of relations affects planning time... I don't get much of a warm fuzzy about lowering geqo...
>
> Yeah, it's probably not that simple. A trawl through the archives
> reminded me that we've discussed this quite a bit in the past already.
> The collapse limits are important for the regular planner not only to
> limit runtime but also to limit planner memory consumption; moreover,
> GEQO doesn't behave all that well either with very large join problems.
> These facts killed a proposal back in 2009 to remove the collapse limits
> altogether. There was also some discussion in 2011, see thread here:
> http://www.postgresql.org/message-id/BANLkTin4ncKMg+bEixi1WB1RJPrZ5dVRgQ@mail.gmail.com
> but the general feeling seemed to be that we needed more planner
> infrastructure work first. In particular it seems like the best way
> forward might require limiting subproblem size using something more
> sophisticated than just "number of relations".

Yeah, I saw one mention of 1GB... that's a bit disconcerting.

Is there a way to measure memory consumption during planning, short of something like strace? (I've got no dev tools available on our servers.)
--
Jim Nasby, Lead Data Architect (512) 569-9461

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2013-10-29 20:36:38 Re: Problems with hash join over nested loop
Previous Message Tom Lane 2013-10-29 18:20:23 Re: Problems with hash join over nested loop