Re: Push down more full joins in postgres_fdw

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Push down more full joins in postgres_fdw
Date: 2016-10-26 09:25:03
Message-ID: CAFjFpRcsm1j8YdKT3PC43Z_F5QufYC4M_mnh_ghy-0iAzh+S9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
>>>> I guess, the arrays need to be
>>>> computed only once for any relation when the query for that relation
>>>> is deparsed the first time.
>
>
>>> Does this algorithm extend to the case where we consider paths for every
>>> join order?
>
>
>> Yes, if we store the information about which of relations need
>> subquery and which don't for every join order.
>
>
> Hmm. Sorry, I'm not so excited about this proposal. I think (1) that is
> solving a problem that hasn't been proven to be a problem, (2) that would
> complicate the deparser logic, and (3) the cost of creating this array for
> each relation by the bottom-up method while deparsing a remote query would
> be not small (especially when the query is large), so that might need more
> cycles for deparsing the query than what I proposed when
> use_remote_estimate=false. So, I'd like to go with what I proposed, at
> least as the first cut.

The arrays will need to computed only when there is at least one
relation required to be deparsed as subquery. Every relation above the
relation which is converted into subquery requires the array. Each
array will be N * size of pointer long where N is the largest relid
covered by that relation. N will vary across the RelOptInfo hierarchy.
All that array holds is targetlist pointer which are required to
computed anyway. So, the amount of memory is bounded by N * size of
pointer * (2N - 1), which is way lesser than what we use in other
places.

Your patch calls isSubqueryExpr() recursively for every Var in the
targetlist, which can be many for foreign tables with many columns.
For every such Var it may need to reach upto the relation which is
converted into subquery, which can as bad as reaching every base
relation. So, it looks like the number of recursive calls to
isSubqueryExpr() is bounded by V * N (i.e. worst case depth of the
RelOptInfo tree), which can be quite costly. If use_remote_estimates
is true, we do this for every intermediate relation and for every path
created. That isn't very performance efficient.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-10-26 09:34:09 Re: Declarative partitioning - another take
Previous Message Amit Kapila 2016-10-26 09:13:27 Re: [BUG] pg_basebackup from disconnected standby fails