From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> |
Cc: | Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com> |
Subject: | Re: Foreign join pushdown vs EvalPlanQual |
Date: | 2015-10-19 22:07:50 |
Message-ID: | CA+TgmobAefSqj2-EvVo52sQ7AnUzFC-c0NMOPxjckhOFeE1LUQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Oct 19, 2015 at 12:17 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
> 1. Fetch every EPQ slot of base relations involved in this join.
> In case of ForeignScan, all the required tuples of base relations
> should be filled because it is preliminary fetched by whole-row var
> if earlier row-locking, or by RefetchForeignRow if later row-locking.
> In case of CustomScan, it can call ExecProcNode() to generate the
> first tuple even if it does not exists.
> Anyway, I assume all the component tuples of this join can be fetched
> using existing EPQ slot because they are owned by base relations.
>
> 2. The recheck callback fills up ss_ScanTupleSlot according to the
> fdw_scan_tlist or custom_scan_tlist. The callback knows the best way
> to reconstruct the joined tuple from the base relations' tuple fetched
> on the step-1.
> For example, if joined tuple is consists of (t1.a, t1.b, t2.x, t3.s),
> the callback picks up 't1.a' and 't1.b' from the tuple fetched from
> the EPQ slot of t1, then put these values onto the 1st and 2nd slot.
> Also, it picks up 't2.x' from the tuple fetched from the EPQ slot of
> t2, then put this value onto the 3rd slot. Same as above for 't3'.
> At this point, ss_ScanTupleSlot gets filled up by the expected fields
> as if join clauses are satisfied.
>
> 3. The recheck callback also checks qualifiers of base relations that
> are pushed down. Because expression nodes kept in fds_exprs or
> custom_exprs are initialized to reference ss_ScanTupleSlot at setrefs.c,
> it is more reasonable to run ExecQual after the step-2.
> If one of the qualifiers of base relation was evaluated as false,
> the recheck callback returns an empty slot.
>
> 4. The recheck callback also checks join-clauses to join underlying
> base relations. Due to same reason at step-3, it is more reasonable
> to execute ExecQual after the step-2.
> If one of the join-clauses was evaluated as false, the recheck returns
> an empty slot.
> Elsewhere, it returns ss_ScanTupleSlot, then ExecScan will process
> any further jobs.
Hmm, I guess this would work. But it still feels unnatural to me. It
feels like we haven't really pushed down the join. It's pushed down
except when there's an EPQ check, and then it's not. So we need a
whole alternate plan tree. With my proposal, we don't need that.
There is also some possible loss of efficiency with this approach.
Suppose that we have two tables ft1 and ft2 which are being joined,
and we push down the join. They are being joined on an integer
column, and the join needs to select several other columns as well.
However, ft1 and ft2 are very wide tables that also contain some text
columns. The query is like this:
SELECT localtab.a, ft1.p, ft2.p FROM localtab LEFT JOIN (ft1 JOIN ft2
ON ft1.x = ft2.x AND ft1.huge ~ 'stuff' AND f2.huge2 ~ 'nonsense') ON
localtab.q = ft1.q;
If we refetch each row individually, we will need a wholerow image of
ft1 and ft2 that includes all columns, or at least f1.huge and
f2.huge2. If we just fetch a wholerow image of the join output, we
can exclude those. The only thing we need to recheck is that it's
still the case that localtab.q = ft1.q (because the value of
localtab.q might have changed).
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Jim Nasby | 2015-10-19 22:12:08 | Re: SuperUser check in pg_stat_statements |
Previous Message | rajan | 2015-10-19 20:48:04 | Re: SuperUser check in pg_stat_statements |