From: | Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> |
Cc: | Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: bug in update tuple routing with foreign partitions |
Date: | 2019-04-11 11:31:06 |
Message-ID: | 5CAF257A.9050604@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
(2019/04/11 14:33), Amit Langote wrote:
> On 2019/04/10 17:38, Etsuro Fujita wrote:
>> (2019/03/06 18:33), Amit Langote wrote:
>>> I noticed a bug with how UPDATE tuple routing initializes ResultRelInfos
>>> to use for partition routing targets. Specifically, the bug occurs when
>>> UPDATE targets include a foreign partition that is locally modified (as
>>> opposed to being modified directly on the remove server) and its
>>> ResultRelInfo (called subplan result rel in the source code) is reused for
>>> tuple routing:
>>
>>> The solution is to teach ExecSetupPartitionTupleRouting to avoid using a
>>> subplan result rel if its ri_FdwState is already set.
>>
>> I'm not sure that is a good idea, because that requires to create a new
>> ResultRelInfo, which is not free; I think it requires a lot of work.
>
> After considering this a bit more and studying your proposed solution, I
> tend to agree. Beside the performance considerations you point out that I
> too think are valid, I realize that going with my approach would mean
> embedding some assumptions in the core code about ri_FdwState; in this
> case, assuming that a subplan result rel's ri_FdwState is not to be
> re-purposed for tuple routing insert, which is only based on looking at
> postgres_fdw's implementation.
Yeah, that assumption might not apply to some other FDWs.
> Not to mention the ensuing complexity in
> the core tuple routing code -- it now needs to account for the fact that
> not all subplan result rels may have been re-purposed for tuple-routing,
> something that's too complicated to handle in PG 11.
I think so too.
> IOW, let's call this a postgres_fdw bug and fix it there as you propose.
Agreed.
>> Another solution to avoid that would be to store the fmstate created by
>> postgresBeginForeignInsert() into the ri_FdwState, not overwrite the
>> ri_FdwState, like the attached. This would not need any changes to the
>> core, and this would not cause any overheads either, IIUC. What do you
>> think about that?
>
> +1. Just one comment:
>
> + /*
> + * If the given resultRelInfo already has PgFdwModifyState set, it means
> + * the foreign table is an UPDATE subplan resultrel; in which case, store
> + * the resulting state into the PgFdwModifyState.
> + */
>
> The last "PgFdwModifyState" should be something else? Maybe,
> sub-PgModifyState or sub_fmstate?
OK, will revise. My favorite would be the latter.
>>> I've also added a
>>> test in postgres_fdw.sql to exercise this test case.
>>
>> Thanks again, but the test case you added works well without any fix:
>
> Oops, I should really adopt a habit of adding a test case before code.
>
>> +-- Check case where the foreign partition is a subplan target rel and
>> +-- foreign parttion is locally modified (target table being joined
>> +-- locally prevents a direct/remote modification plan).
>> +explain (verbose, costs off)
>> +update utrtest set a = 1 from (values (1), (2)) s(x) where a = s.x
>> returning *;
>> + QUERY PLAN
>> +------------------------------------------------------------------------------
>>
>> + Update on public.utrtest
>> + Output: remp.a, remp.b, "*VALUES*".column1
>> + Foreign Update on public.remp
>> + Remote SQL: UPDATE public.loct SET a = $2 WHERE ctid = $1 RETURNING
>> a, b
>> + Update on public.locp
>> + -> Hash Join
>> + Output: 1, remp.b, remp.ctid, "*VALUES*".*, "*VALUES*".column1
>> + Hash Cond: (remp.a = "*VALUES*".column1)
>> + -> Foreign Scan on public.remp
>> + Output: remp.b, remp.ctid, remp.a
>> + Remote SQL: SELECT a, b, ctid FROM public.loct FOR UPDATE
>> + -> Hash
>> + Output: "*VALUES*".*, "*VALUES*".column1
>> + -> Values Scan on "*VALUES*"
>> + Output: "*VALUES*".*, "*VALUES*".column1
>> + -> Hash Join
>> + Output: 1, locp.b, locp.ctid, "*VALUES*".*, "*VALUES*".column1
>> + Hash Cond: (locp.a = "*VALUES*".column1)
>> + -> Seq Scan on public.locp
>> + Output: locp.b, locp.ctid, locp.a
>> + -> Hash
>> + Output: "*VALUES*".*, "*VALUES*".column1
>> + -> Values Scan on "*VALUES*"
>> + Output: "*VALUES*".*, "*VALUES*".column1
>> +(24 rows)
>>
>> In this test case, the foreign partition is updated before ri_FdwState is
>> overwritten by postgresBeginForeignInsert() that's invoked by the tuple
>> routing code, so it would work without any fix. So I modified this test
>> case as such.
>
> Thanks for fixing that. I hadn't noticed that foreign partition remp is
> updated before local partition locp; should've added a locp0 for values
> (0) so that remp appears second in the ModifyTable's subplans. Or, well,
> make the foreign table remp appear later by making it a partition for
> values in (3) as your patch does.
>
> BTW, have you noticed that the RETURNING clause returns the same row twice?
I noticed this, but I didn't think it hard. :(
> +update utrtest set a = 3 from (values (2), (3)) s(x) where a = s.x
> returning *;
> + a | b | x
> +---+-------------------+---
> + 3 | qux triggered ! | 2
> + 3 | xyzzy triggered ! | 3
> + 3 | qux triggered ! | 3
> +(3 rows)
>
> You can see that the row that's moved into remp is returned twice (one
> with "qux triggered!" in b column), whereas it should've been only once?
Yeah, this is unexpected behavior, so will look into this. Thanks for
pointing that out!
Best regards,
Etsuro Fujita
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2019-04-11 11:52:40 | Re: Pluggable Storage - Andres's take |
Previous Message | Anastasia Lubennikova | 2019-04-11 11:30:20 | Re: Failure in contrib test _int on loach |