Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, "pgsql-hackers(at)postgreSQL(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Custom/Foreign-Join-APIs (Re: [v9.5] Custom Plan API)
Date: 2015-04-22 10:28:04
Message-ID: CAFjFpRexnwj6s+3eevVE6Aq3AAkWwPX+Ymm93qkb6V5QGHM=9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I reviewed the foreign_join_v13 patch. Here are my comments

Thanks for this work. It's good to see that the The foreign_join patch
includes extensive tests for postgres_fdw. Thanks for the same.

Sanity
---------
The patch foreign_join didn't get applied cleanly with "git apply" but got
applied using "patch". The patch has "trailing whitespace"s.

The patch compiles cleanly with pgsql-v9.5-custom-join.v11.patch.

make check in regress and postgres_fdw folders passes without any failures.

Tests
-------
1.The postgres_fdw test is re/setting enable_mergejoin at various places.
The goal of these tests seems to be to test the sanity of foreign plans
generated. So, it might be better to reset enable_mergejoin (and may be all
of enable_hashjoin, enable_nestloop_join etc.) to false at the beginning of
the testcase and set them again at the end. That way, we will also make
sure that foreign plans are chosen irrespective of future planner changes.
2. In the patch, I see that the inheritance testcases have been deleted
from postgres_fdw.sql, is that intentional? I do not see those being
replaced anywhere else.
3. We need one test for each join type (or at least for INNER and LEFT
OUTER) where there are unsafe to push conditions in ON clause along-with
safe-to-push conditions. For INNER join, the join should get pushed down
with the safe conditions and for OUTER join it shouldn't be. Same goes for
WHERE clause, in which case the join will be pushed down but the
unsafe-to-push conditions will be applied locally.
4. All the tests have ORDER BY, LIMIT in them, so the setref code is being
exercised. But, something like aggregates would test the setref code
better. So, we should add at-least one test like select avg(ft1.c1 +
ft2.c2) from ft1 join ft2 on (ft1.c1 = ft2.c1).
5. It will be good to add some test which contain join between few foreign
and few local tables to see whether we are able to push down the largest
possible foreign join tree to the foreign server.

Code
-------
In classifyConditions(), the code is now appending RestrictInfo::clause
rather than RestrictInfo itself. But the callers of classifyConditions()
have not changed. Is this change intentional? The functions which consume
the lists produced by this function handle expressions as well
RestrictInfo, so you may not have noticed it. Because of this change, we
might be missing some optimizations e.g. in function
postgresGetForeignPlan()
793 if (list_member_ptr(fpinfo->remote_conds, rinfo))
794 remote_conds = lappend(remote_conds, rinfo->clause);
795 else if (list_member_ptr(fpinfo->local_conds, rinfo))
796 local_exprs = lappend(local_exprs, rinfo->clause);
797 else if (is_foreign_expr(root, baserel, rinfo->clause))
798 remote_conds = lappend(remote_conds, rinfo->clause);
799 else
800 local_exprs = lappend(local_exprs, rinfo->clause);
Finding a RestrictInfo in remote_conds avoids another call to
is_foreign_expr(). So with this change, I think we are doing an extra call
to is_foreign_expr().

The function get_jointype_name() returns an empty string for unsupported
join types. Instead of that it should throw an error, if some code path
accidentally calls the function with unsupported join type e.g. SEMI_JOIN.

While deparsing the SQL with rowmarks, the placement of FOR UPDATE/SHARE
clause in the original query is not being honored, which means that we will
end up locking the rows which are not part of the join result even when the
join is pushed to the foreign server. E.g take the following query (it uses
the tables created in postgres_fdw.sql tests)
contrib_regression=# explain verbose select * from ft1 join ft2 on (ft1.c1
= ft2.c1)* for update of ft1*;

QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
LockRows (cost=100.00..124.66 rows=822 width=426)
Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7, ft1.c8,
ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8, ft1.*, ft2.*
-> Foreign Scan (cost=100.00..116.44 rows=822 width=426)
Output: ft1.c1, ft1.c2, ft1.c3, ft1.c4, ft1.c5, ft1.c6, ft1.c7,
ft1.c8, ft2.c1, ft2.c2, ft2.c3, ft2.c4, ft2.c5, ft2.c6, ft2.c7, ft2.c8,
ft1.*,
ft2.*
Relations: (public.ft1) INNER JOIN (public.ft2)
Remote SQL: SELECT l.a1, l.a2, l.a3, l.a4, l.a5, l.a6, l.a7, l.a8,
l.a9, r.a1, r.a2, r.a3, r.a4, r.a5, r.a6, r.a7, r.a8, r.a9 FROM (SELECT l.a
10, l.a11, l.a12, l.a13, l.a14, l.a15, l.a16, l.a17, ROW(l.a10, l.a11,
l.a12, l.a13, l.a14, l.a15, l.a16, l.a17) FROM
*(SELECT "C 1" a10, c2 a11, c3 a12, c4 a13, c5 a14, c6 a15, c7 a16, c8 a17
FROM "S 1"."T 1" FOR UPDATE)* l) l (a1, a2, a3, a4, a5, a6, a7, a8, a9)
INNER JOIN (SELECT r.a9, r.a10, r.a12,
r.a13, r.a14, r.a15, r.a16, r.a17, ROW(r.a9, r.a10, r.a12, r.a13, r.a14,
r.a15, r.a16, r.a17) FROM (SELECT "C 1" a9, c2 a10, c3 a12, c4 a13, c5 a14,
c6
a15, c7 a16, c8 a17 FROM "S 1"."T 1") r) r (a1, a2, a3, a4, a5, a6, a7,
a8, a9) ON ((l.a1 = r.a1))
(6 rows)
It's expected that only the rows which are part of join result will be
locked by FOR UPDATE clause. The query sent to the foreign server has
attached the FOR UPDATE clause to the sub-query for table ft1 ("S 1"."T 1"
on foreign server). As per the postgresql documentation, "When a locking
clause appears in a sub-SELECT, the rows locked are those returned to the
outer query by the sub-query.". So it's going to lock all rows from "S
1"."T 1", rather than only the rows which are part of join. This is going
to increase probability of deadlocks, if the join is between a big table
and small table where big table is being used in many queries and the join
is going to have only a single row in the result.

Since there is no is_first argument to appendConditions(), we should remove
corresponding line from the function prologue.

The name TO_RELATIVE() doesn't convey the full meaning of the macro. May be
GET_RELATIVE_ATTNO() or something like that.

In postgresGetForeignJoinPaths(), while separating the conditions into join
quals and other quals,
3014 if (IS_OUTER_JOIN(jointype))
3015 {
3016 extract_actual_join_clauses(joinclauses, &joinclauses,
&otherclauses);
3017 }
3018 else
3019 {
3020 joinclauses = extract_actual_clauses(joinclauses, false);
3021 otherclauses = NIL;
3022 }
we shouldn't differentiate between outer and inner join. For inner join the
join quals can be treated as other clauses and they will be returned as
other clauses, which is fine. Also, the following condition
3050 /*
3051 * Other condition for the join must be safe to push down.
3052 */
3053 foreach(lc, otherclauses)
3054 {
3055 Expr *expr = (Expr *) lfirst(lc);
3056
3057 if (!is_foreign_expr(root, joinrel, expr))
3058 {
3059 ereport(DEBUG3, (errmsg("filter contains unsafe
conditions")));
3060 return;
3061 }
3062 }
is unnecessary. I there are filter conditions which are unsafe to push
down, they can be applied locally after obtaining the join result from the
foreign server. The join quals are all needed to be safe to push down,
since they decide which rows will contain NULL inner side in an OUTER join.

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

On Fri, Apr 17, 2015 at 10:13 AM, Shigeru HANADA <shigeru(dot)hanada(at)gmail(dot)com>
wrote:

> Kaigai-san,
>
> 2015/04/17 10:13、Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> のメール:
>
> > Hanada-san,
> >
> >> I merged explain patch into foreign_join patch.
> >>
> >> Now v12 is the latest patch.
> >>
> > It contains many garbage lines... Please ensure the
> > patch is correctly based on tOhe latest master +
> > custom_join patch.
>
> Oops, sorry. I’ve re-created the patch as v13, based on Custom/Foreign
> join v11 patch and latest master.
>
> It contains EXPLAIN enhancement that new subitem “Relations” shows
> relations and joins, including order and type, processed by the foreign
> scan.
>
> --
> Shigeru HANADA
> shigeru(dot)hanada(at)gmail(dot)com
>
>
>
>
>
>
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2015-04-22 11:40:07 Re: Improve sleep processing of pg_rewind TAP tests
Previous Message Petr Jelinek 2015-04-22 08:17:50 Re: Replication identifiers, take 4