Quick Links

Re: Why am I getting great/terrible estimates with these CTE queries?

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Why am I getting great/terrible estimates with these CTE queries?
Date:	2012-10-09 23:09:21
Message-ID:	9075.1349824161@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
> I've been fighting with some CTE queries recently, and in the end I've
> ended up with two basic cases. In one case the CTEs work absolutely
> great, making the estimates much more precise, while in the other the
> results are pretty terrible. And I'm not sure why both of these behave
> the way they do.

You're assuming the case where the estimate is better is better for a
reason ... but it's only better as a result of blind dumb luck. The
outer-level query planner doesn't know anything about the CTE's output
except the estimated number of rows --- in particular, it doesn't drill
down to find any statistics about the join column. So what you're
getting there is a default selectivity estimate that just happens to
match reality in this case. (If you work through the math in
eqjoinsel_inner for the case where the relation sizes are grossly
different and we don't have MCV stats, you'll find that it comes out to
be assuming that each row in the larger relation has one join partner in
the smaller one, which indeed is your situation here.) In the other
example, you're likewise getting a default selectivity estimate, only it
doesn't match so well, because the default does not include assuming
that the join keys are unique on both sides. Without the CTEs, the
optimizer can see the keys are unique so it makes the right selectivity
estimate.

In principle we could make the optimizer try to drill down for stats,
which would make these examples work the same with or without the CTE
layers. I'm not sure it's worth the trouble though --- I'm dubious that
people would use a CTE for cases that are simple enough for the stats
estimates to be worth anything.

regards, tom lane

In response to

Why am I getting great/terrible estimates with these CTE queries? at 2012-10-09 22:21:31 from Tomas Vondra

Responses

Re: Why am I getting great/terrible estimates with these CTE queries? at 2012-10-10 00:19:32 from Tomas Vondra

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Craig James	2012-10-09 23:30:11	Re: Hyperthreading (was: Two identical systems, radically different performance)
Previous Message	Tomas Vondra	2012-10-09 22:21:31	Why am I getting great/terrible estimates with these CTE queries?