From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> |
Cc: | "Ashutosh Bapat *EXTERN*" <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Costing foreign joins in postgres_fdw |
Date: | 2015-12-18 16:39:13 |
Message-ID: | CA+TgmoZbbnCX_9c=kqUis9cMUb61GO+5EJP7rMCigVmYupOXzQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Dec 18, 2015 at 8:09 AM, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at> wrote:
> My gut feeling is that for a join where all join predicates can be pushed down, it
> will usually be a win to push the join to the foreign server.
>
> So in your first scenario, I'd opt for always pushing down the join
> if possible if use_remote_estimate is OFF.
>
> Your second scenario is essentially to estimate that a pushed down join will
> always be executed as a nested loop join, which will in most cases produce
> an unfairly negative estimate.
+1 to all that. Whatever we do here for costing in detail, it should
be set up so that the pushed-down join wins unless there's some pretty
tangible reason to think, in a particular case, that it will lose.
> What about using local statistics to come up with an estimated row count for
> the join and use that as the basis for an estimate? My idea here is that it
> is always be a win to push down a join unless the result set is so large that
> transferring it becomes the bottleneck.
This also sounds about right.
> Maybe, to come up with something remotely realistic, a formula like
>
> sum of locally estimated costs of sequential scan for the base table
> plus count of estimated result rows (times a factor)
Was this meant to say "the base tables", plural?
I think whatever we do here should try to extend the logic in
postgres_fdw's estimate_path_cost_size() to foreign tables in some
reasonably natural way, but I'm not sure exactly what that should look
like. Maybe do what that function currently does for single-table
scans, and then add all the values up, or something like that. I'm a
little worried, though, that the planner might then view a query that
will be executed remotely as a nested loop with inner index-scan as
not worth pushing down, because in that case the join actually will
not touch every row from both tables, as a hash or merge join would.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Stephen Frost | 2015-12-18 16:40:35 | Re: [COMMITTERS] pgsql: Handle policies during DROP OWNED BY |
Previous Message | Robert Haas | 2015-12-18 16:28:41 | Re: Speed up Clog Access by increasing CLOG buffers |