From: | "David Johnston" <polobo(at)yahoo(dot)com> |
---|---|
To: | "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: A Better Way? (Multi-Left Join Lookup) |
Date: | 2012-07-20 23:38:29 |
Message-ID: | 010301cd66d0$c4a71f50$4df55df0$@yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
> -----Original Message-----
> From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
> owner(at)postgresql(dot)org] On Behalf Of Tom Lane
> Sent: Friday, July 20, 2012 6:51 PM
> To: David Johnston
> Cc: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] A Better Way? (Multi-Left Join Lookup)
>
> "David Johnston" <polobo(at)yahoo(dot)com> writes:
> >> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us] Surely not. Neither merge
> >> nor hash joins require an index. What plan is getting selected?
>
> > I have attached a scrubbed query and explain/analyze. Let me know if
> > something more is needed.
>
> Well, here's your problem:
>
> > CTE master_listing {# The LEFT side of the multi-joins #}
> > -> Subquery Scan on call (cost=22762.65..22762.94 rows=1
> > width=32) (actual time=619.158..735.559 rows=8656 loops=1)
>
> The planner thinks master_listing will return only one row, which would
> make a nestloop the right way to do things. However, with 8500 rows
coming
> out, the nestloop iterates 8500 times and takes forever.
>
> So what you need to do is figure out why that rowcount estimate is so far
off
> and do whatever's needful to make it better. It does not have to be dead
on
> --- even an estimate of a few dozen rows would likely be enough to
> discourage the planner from using a nestloop.
>
> You haven't shown enough info for anybody else to guess exactly why the
> rowcount estimate is bad, though.
>
> regards, tom lane
>
OK.
So,
EXPLAIN SELECT function_call(...) -- yields a planner expectation of 1 row
[Whereas]
EXPLAIN SELECT * FROM function_call(...) -- yields a planner expectation of
"result_rows" which defaults to 1000
The syntax:
SELECT function_call(field_on_another_relation)
FROM another_relation
Is convenient in order to avoid...
SELECT * FROM function_call(
(SELECT field_on_another_relation FROM another_relation)
);
...especially when you need multiple fields from "another_relation"
I guess I get the idea that a function used "inline" is generally going to
return a single result and so the estimate of "1" is most probable.
May I suggest, then, that the CREATE FUNCTION documentation for "ROWS
result_rows" be modified:
Something like:
"The default assumption is 1,000 rows if the function is called in the FROM
clause of a query. If it is called anywhere else (e.g., the Select List)
the assumption is 1 row regardless of an explicit or default ROWS estimate."
Was this an intentional design decision to override the result_rows estimate
of the function if it is used in the select list? I get the general
reasoning behind it and do not know enough regarding the internals to make a
judgement but if intentional could it maybe be a little smarter as to when
the override occurs? Obviously the ideal solution is to implement
LATERAL...
FYI: I included the following section in the query I provided because I
suspected the function call may have been the issue...
, master_listing AS (
SELECT
-- identifier fields
FROM (
SELECT (func).* FROM (
SELECT fuction_generating_8500_records(...)
--<<<<< Use of function in Select List; results in the 1-row estimate.
1,000 rows triggers the "MERGE LEFT JOIN" plan
) func
FROM scenario_info
) call
) master (function_column_rename)
)
Thank You!
David J.
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2012-07-21 01:41:09 | Re: big database resulting in small dump |
Previous Message | Tom Lane | 2012-07-20 22:51:09 | Re: A Better Way? (Multi-Left Join Lookup) |