From: | Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com> |
Subject: | Re: Foreign join pushdown vs EvalPlanQual |
Date: | 2015-11-04 09:50:23 |
Message-ID: | 5639D4DF.5020709@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2015/11/04 17:10, Kouhei Kaigai wrote:
>> On 2015/10/28 6:04, Robert Haas wrote:
>>> On Tue, Oct 20, 2015 at 12:39 PM, Etsuro Fujita
>>> <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>>> Sorry, my explanation was not correct. (Needed to take in caffeine.) What
>>>> I'm concerned about is the following:
>>>>
>>>> SELECT * FROM localtab JOIN (ft1 LEFT JOIN ft2 ON ft1.x = ft2.x) ON
>>>> localtab.id = ft1.id FOR UPDATE OF ft1
>>>>
>>>> LockRows
>>>> -> Nested Loop
>>>> Join Filter: (localtab.id = ft1.id)
>>>> -> Seq Scan on localtab
>>>> -> Foreign Scan on <ft1, ft2>
>>>> Remote SQL: SELECT * FROM ft1 LEFT JOIN ft2 WHERE ft1.x = ft2.x
>>>> FOR UPDATE OF ft1
>>>>
>>>> Assume that ft1 performs late row locking.
>>> If the SQL includes "FOR UPDATE of ft1", then it clearly performs
>>> early row locking. I assume you meant to omit that.
>>>> If an EPQ recheck was invoked
>>>> due to a concurrent transaction on the remote server that changed only the
>>>> value x of the ft1 tuple previously retrieved, then we would have to
>>>> generate a fake ft1/ft2-join tuple with nulls for ft2. (Assume that the ft2
>>>> tuple previously retrieved was not a null tuple.) However, I'm not sure how
>>>> we can do that in ForeignRecheck; we can't know for example, which one is
>>>> outer and which one is inner, without an alternative local join execution
>>>> plan. Maybe I'm missing something, though.
>>> I would expect it to issue a new query like: SELECT * FROM ft1 LEFT
>>> JOIN ft2 WHERE ft1.x = ft2.x AND ft1.tid = $0 AND ft2.tid = $1.
>> We assume here that ft1 uses late row locking, so I thought the above
>> SQL should include "FOR UPDATE of ft1". But I still don't think that
>> that is right; the SQL with "FOR UPDATE of ft1" wouldn't generate the
>> fake ft1/ft2-join tuple with nulls for ft2, as expected. The reason for
>> that is that the updated version of the ft1 tuple wouldn't satisfy the
>> ft1.tid = $0 condition in an EPQ recheck, because the ctid for the
>> updated version of the ft1 tuple has changed. (IIUC, I think that if we
>> use a TID scan for ft1, the SQL would generate the expected result,
>> because I think that the TID condition would be ignored in the EPQ
>> recheck, but I don't think it's guaranteed to use a TID scan for ft1.)
>> Maybe I'm missing something, though.
> It looks to me, we should not use ctid system column to identify remote
> row when postgres_fdw tries to support late row locking.
>
> The documentation says:
> http://www.postgresql.org/docs/devel/static/fdw-callbacks.html#FDW-CALLBACKS-UPDATE
>
> UPDATE and DELETE operations are performed against rows previously
> fetched by the table-scanning functions. The FDW may need extra information,
> such as a row ID or the values of primary-key columns, to ensure that it can
> identify the exact row to update or delete
>
> The "rowid" should not be changed once it is fetched from the remote side
> until it is actually updated, deleted or locked, for correct identification.
> If ctid is used for this purpose, it is safe only when remote row is locked
> when it is fetched - it is exactly early row locking behavior, isn't it?
Yeah, we should use early row locking for a target foreign table in
UPDATE/DELETE.
In case of SELECT FOR UPDATE, I think we are allowed to use ctid to
identify target rows for late row locking, but I think the above SQL
should be changed to something like this:
SELECT * FROM (SELECT * FROM ft1 WHERE ft1.tid = $0 FOR UPDATE) ss1 LEFT
JOIN (SELECT * FROM ft2 WHERE ft2.tid = $1) ss2 ON ss1.x = ss2.x
>>> This should be significantly more efficient than fetching the base
>>> rows from each of two tables with two separate queries.
>> Maybe I think we could fix the SQL, so I have to admit that, but I'm
>> just wondering (1) what would happen for the case when ft1 uses late row
>> rocking and ft2 uses early row rocking and (2) that would be still more
>> efficient than re-fetching only the base row from ft1.
> It should be decision by FDW driver. It is not easy to estimate a certain
> FDW driver mixes up early and late locking policy within a same remote join
> query. Do you really want to support such a mysterious implementation?
Yeah, the reason for that is because GetForeignRowMarkType allows that.
> Or, do you expect all the FDW driver is enforced to return a joined tuple
> if remote join case?
No. That wouldn't make sense if at least one component table involved
in a foreign join uses the rowmark type other than ROW_MARK_COPY.
> It is different from my idea; it shall be an extra
> optimization option if FDW can fetch a joined tuple at once, but not always.
> So, if FDW driver does not support this optimal behavior, your driver can
> fetch two base tables then run local alternative join (or something other).
OK, so if we all agree that the joined-tuple optimization is just an
option for the case where all the component tables use ROW_MARK_COPY,
I'd propose to leave that for 9.6.
Best regards,
Etsuro Fujita
From | Date | Subject | |
---|---|---|---|
Next Message | Etsuro Fujita | 2015-11-04 09:59:02 | Re: Foreign join pushdown vs EvalPlanQual |
Previous Message | Antonin Houska | 2015-11-04 09:12:21 | Re: Bitmap index scans use of filters on available columns |