Re: star schema and the optimizer

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marc Cousin <cousinmarc(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: star schema and the optimizer
Date: 2015-02-27 15:49:31
Message-ID: 31069.1425052171@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Marc Cousin <cousinmarc(at)gmail(dot)com> writes:
> On 27/02/2015 15:08, Tom Lane wrote:
>> That's right, and as you say, the planning-speed consequences of doing
>> otherwise would be disastrous.

> What do you mean by disastrous ?

Let's assume you have ten fact tables, so ten join conditions (11 tables
altogether). As-is, the planner considers ten 2-way joins (all between
the fact table and one or another dimension table). At the level of 3-way
joins, there are 45 possible join relations each of which has just 2
possible sets of input relations. At the level of 4-way joins, there are
120 join relations to consider each of which can be made in only 3 ways.
And so on. It's combinatorial but the growth rate is manageable.

On the other hand, if we lobotomize the must-have-a-join-condition filter,
there are 11 choose 2 possible 2-way joins, or 55. At the level of 3-way
joins, there are 165 possible joins, but each of these can be made in 3
different ways from a 2-way join and a singleton. At the level of 4-way
joins, there are 330 possible joins, but each of these can be made in
4 ways from a 3-way join and a singleton plus 6 different ways from two
2-way joins. So at this level we're considering something like 3300
different join paths versus 360 in the existing planner. It gets rapidly
worse.

The real problem is that in most cases, all those extra paths are utterly
useless. They would not be less useless just because you're a "data
warehouse" or whatever. So I'm uninterested in simply lobotomizing that
filter.

What would make more sense is to notice that the fact table has an index
that's amenable to this type of optimization, and then use that to loosen
up the join restrictions, rather than just blindly consider useless joins.

I had actually thought that we'd fixed this type of problem in recent
versions, and that you should be able to get a plan that would look like

Nestloop
-> scan dim1
-> Nestloop
-> scan dim2
-> indexscan fact table using dim1.a and dim2.b

which would arise from developing an indexscan on fact that's
parameterized by both other tables, resolving one of those
parameterizations via a join to dim2, and then the other one
via a join to dim1. I'm not sure offhand why that isn't working
in this example.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-02-27 16:19:12 Re: star schema and the optimizer
Previous Message Marc Cousin 2015-02-27 15:48:09 Re: star schema and the optimizer