From: | "Robert Haas" <robertmhaas(at)gmail(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Yoshiyuki Asaba" <y-asaba(at)sraoss(dot)co(dot)jp>, "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Common Table Expressions applied; some issues remain |
Date: | 2008-10-07 04:18:10 |
Message-ID: | 603c8f070810062118u5fad359dlbb1bfc2d129ace2a@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
> If the planner chooses to do this as a nestloop with table2 on the
> inside, then expensive_function() can get evaluated multiple times on
> the same row of table2. We really don't make very many guarantees about
> what will happen with functions inside inlined views, even with "offset
> 0" as an optimization fence. So I was thinking that taking a strong
> reading of the spec's wording about single evaluation of WITH clauses
> might provide useful leverage for people who need to control evaluation
> of expensive or volatile functions better than they can now.
+1 for a strong reading. I think the ability to prevent multiple
evaluations of expensive functions is key here.
> Another possibility that we could think about is: if a CTE is only
> referenced once, then push down any restriction clauses that are
> available at the single call site, but still execute it using the
> CteScan materialization logic. The evaluation guarantee would then
> look like "no row of the CTE's result is evaluated twice, but some rows
> might not be evaluated at all".
Assuming a perfectly intelligent optimizer, the only advantage of the
=1 guarantee over the <=1 guarantee is that you can evaluate the
entire CTE for side-effects and then fetch back only a subset of the
data to return to the user. This seems likely to be a pretty rare use
case, though, and the rest of the time you'd presumably prefer for
performance reasons to have as little of the CTE as possible
executed.... so +1 for <=1.
> What we'd pay for this is that the CTE
> could not be the inside of a nestloop with inner indexscan using a join
> condition, since we don't have any way to keep track of which rows were
> already fetched in that case.
Is it not possible to consider both plans? That is, compare the cost
of evaluating every row and then doing a nestloop with inner indexscan
versus using some other plan and evaluating only the rows meeting the
quals?
As a side note, in theory, I think you could generalize this to CTEs
with multiple call sites by taking the logical OR of the available
quals. This might not be worth it, though unless the quals are highly
selective.
...Robert
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Smith | 2008-10-07 05:58:16 | Re: Shouldn't pg_settings.enumvals be array of text? |
Previous Message | Tom Lane | 2008-10-07 00:39:15 | Re: Common Table Expressions applied; some issues remain |