Re: Common Table Expressions applied; some issues remain

From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Yoshiyuki Asaba" <y-asaba(at)sraoss(dot)co(dot)jp>, "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Common Table Expressions applied; some issues remain
Date: 2008-10-07 04:18:10
Message-ID: 603c8f070810062118u5fad359dlbb1bfc2d129ace2a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> If the planner chooses to do this as a nestloop with table2 on the
> inside, then expensive_function() can get evaluated multiple times on
> the same row of table2. We really don't make very many guarantees about
> what will happen with functions inside inlined views, even with "offset
> 0" as an optimization fence. So I was thinking that taking a strong
> reading of the spec's wording about single evaluation of WITH clauses
> might provide useful leverage for people who need to control evaluation
> of expensive or volatile functions better than they can now.

+1 for a strong reading. I think the ability to prevent multiple
evaluations of expensive functions is key here.

> Another possibility that we could think about is: if a CTE is only
> referenced once, then push down any restriction clauses that are
> available at the single call site, but still execute it using the
> CteScan materialization logic. The evaluation guarantee would then
> look like "no row of the CTE's result is evaluated twice, but some rows
> might not be evaluated at all".

Assuming a perfectly intelligent optimizer, the only advantage of the
=1 guarantee over the <=1 guarantee is that you can evaluate the
entire CTE for side-effects and then fetch back only a subset of the
data to return to the user. This seems likely to be a pretty rare use
case, though, and the rest of the time you'd presumably prefer for
performance reasons to have as little of the CTE as possible
executed.... so +1 for <=1.

> What we'd pay for this is that the CTE
> could not be the inside of a nestloop with inner indexscan using a join
> condition, since we don't have any way to keep track of which rows were
> already fetched in that case.

Is it not possible to consider both plans? That is, compare the cost
of evaluating every row and then doing a nestloop with inner indexscan
versus using some other plan and evaluating only the rows meeting the
quals?

As a side note, in theory, I think you could generalize this to CTEs
with multiple call sites by taking the logical OR of the available
quals. This might not be worth it, though unless the quals are highly
selective.

...Robert

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2008-10-07 05:58:16 Re: Shouldn't pg_settings.enumvals be array of text?
Previous Message Tom Lane 2008-10-07 00:39:15 Re: Common Table Expressions applied; some issues remain