Re: Thinking About Correlated Columns (again)

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Craig James <cjames(at)emolecules(dot)com>
Cc: sthomas(at)optionshouse(dot)com, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Thinking About Correlated Columns (again)
Date: 2013-05-15 16:39:46
Message-ID: 5193BA52.7040007@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 05/15/2013 12:23 PM, Craig James wrote:
> On Wed, May 15, 2013 at 8:31 AM, Shaun Thomas
> <sthomas(at)optionshouse(dot)com <mailto:sthomas(at)optionshouse(dot)com>> wrote:
>
> [Inefficient plans for correlated columns] has been a pain point
> for quite a while. While we've had several discussions in the
> area, it always seems to just kinda trail off and eventually
> vanish every time it comes up.
>
> ...
> Since there really is no fix for this aside from completely
> rewriting the query or horribly misusing CTEs (to force sequence
> scans instead of bloated nested loops, for example)...
> I'm worried that without an easy answer for cases like this, more
> DBAs will abuse optimization fences to get what they want and
> we'll end up in a scenario that's actually worse than query hints.
> Theoretically, query hints can be deprecated or have GUCs to
> remove their influence, but CTEs are forever, or until the next
> code refactor.
>
> I've seen conversations on this since at least 2005. There were
> even proposed patches every once in a while, but never any
> consensus. Anyone care to comment?
>
>
> It's a very hard problem. There's no way you can keep statistics
> about all possible correlations since the number of possibilities is
> O(N^2) with the number of columns.
>

I don't see why we couldn't allow the DBA to specify some small subset
of the combinations of columns for which correlation stats would be
needed. I suspect in most cases no more than a handful for any given
table would be required.

cheers

andrew

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Nikolas Everett 2013-05-15 17:30:57 Re: Thinking About Correlated Columns (again)
Previous Message Shaun Thomas 2013-05-15 16:27:29 Re: Thinking About Correlated Columns (again)