Costing foreign joins in postgres_fdw

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Costing foreign joins in postgres_fdw
Date: 2015-12-18 12:16:37
Message-ID: CAFjFpRcqSwUs+tb5iyp1M3c-w0k3xaB6H5mw4+N2q2iuAfSzKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi All,
Costs for foreign queries are either obtained from the foreign server using
EXPLAIN (if use_remote_estimate is ON) otherwise they are cooked up locally
based on the statistics available. For joins as well, we have to do the
same. If use_remote_estimates [1] is ON, we can get the costs from the
foreign server. Rest of the mail discusses approaches for estimating the
costs when use_remote_estimates is OFF.

1. Unlike base relations where the table data "has to be" fetched from the
foreign server, a join doesn't "have to be" fetched from the foreign
server. So, even if use_remote_estimate is OFF for a base relation, we do
try to estimate something locally. But for a join that's not compulsory, so
we can choose not to estimate anything and not push down the join if
use_remote_estimate is OFF. Whether we do that depends upon how well we can
estimate the join cost when use_remote_estimate is OFF.

2. Locally estimating the cost of join that will be performed on the
foreign server is difficult because we do not know which join strategy the
foreign server is going to use, which in turn depends upon the availability
of indexes, work memory, statistics about joining expressions etc. One way
to do this is to use the cost of cheapest local join path built upon
foreign outer and inner paths, to estimate the cost of executing the join
at the foreign server The startup and run time costs for sending, parsing
and planning query at the foreign server as well as the cost to fetch the
tuples need to be adjusted, so that it doesn't get counted twice. We may
assume that the cost for the foreign join will be some factor of the
adjusted cost, like we have done for estimating cost of sort pushdown. The
reason we choose cheapest path with foreign inner and outer paths is
because that's likely to be a closer to the real estimate than the path
which does not have foreign inner and outer paths. In the absence of such
path, we should probably not push the join down since no local path has
found pushing inner and outer to be cheaper and it's likely (certainly not
a rule) that pushing the join in question down is not going to be cheaper
than the local paths.

1st option is easy but it sounds too restrictive. 2nd option liberal but is
complex.

Any other ideas as to how we can estimate cost of foreign join when
use_remote_estimate is OFF?

[1]
http://www.postgresql.org/message-id/CAFjFpRepSC2e3mZ1uYSopJD6R19fOZ0dNNf9Z=gnyKSB6wGk5g@mail.gmail.com
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Albe Laurenz 2015-12-18 13:09:35 Re: Costing foreign joins in postgres_fdw
Previous Message Amit Kapila 2015-12-18 11:25:56 Re: Patch: fix lock contention for HASHHDR.mutex