Shaun Thomas <sthomas(at)peak6(dot)com> wrote:
> I was thinking more:
>
> JOIN foo_tab USING (foo_id) WITH (COST=50)
The problem I have with that syntax is that it would be hard to read
when you have some nested set of joins or a (SELECT) in the JOIN
instead of simple table name. For me, at least, it would "get lost"
less easily if it were right next to the JOIN keyword.
The problem with a COST factor is that it's not obvious to me what
it would apply to:
- each row on the left?
- each row on the right?
- each row in the result of the JOIN step?
- the entire step?
How would it scale based on other criteria which affected the number
of rows on either side of the join?
If I'm understanding the problem correctly, the part the optimizer
gets wrong (because we don't yet have statistics to support a better
assumption) is assuming that selection criteria on opposite sides of
a join affect entirely independent sets of what would be in the
result without the criteria. To use an oft-cited example, when one
table is selected by zip code and the other by city, that's a bad
assumption about the correlation, leading to bad estimates, leading
to bad costing, leading to bad plans. The OP wanted to override
step 4, a COST setting would try to override step 3, but I think we
would want to override step 1 (until we get statistics which let us
compute that accurately).
-Kevin