Re: a misbehavior of partition row movement (?)

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Arne Roland <A(dot)Roland(at)index(dot)de>, "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: a misbehavior of partition row movement (?)
Date: 2021-10-14 09:00:51
Message-ID: CA+HiwqFpeYBXT=Ss8Ya-FMxeHPYEz0p-+BdBa5neq7pD3hB3ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Sep 20, 2021 at 3:32 PM Amit Langote <amitlangote09(at)gmail(dot)com> wrote:
> The problem was that the tuplestore
> (afterTriggers.query_stack[query_level].tuplestore) that I decided to
> use to store the AFTER trigger tuples of a partitioned table that is
> the target of an cross-partition update lives only for the duration of
> a given query. So that approach wouldn't work if the foreign key
> pointing into that partitioned table is marked INITIALLY DEFERRED. To
> fix, I added a List field to AfterTriggersData that stores the
> tuplestores to store the tuples of partitioned tables that undergo
> cross-partition updates in a transaction and are pointed to by
> INITIALLY DEFERRED foreign key constraints. I couldn't understand one
> comment about why using a tuplestore for such cases *might not* work,
> which as follows:
>
> * Note that we need triggers on foreign tables to be fired in exactly the
> * order they were queued, so that the tuples come out of the tuplestore in
> * the right order. To ensure that, we forbid deferrable (constraint)
> * triggers on foreign tables. This also ensures that such triggers do not
> * get deferred into outer trigger query levels, meaning that it's okay to
> * destroy the tuplestore at the end of the query level.
>
> I tried to break the approach using various test cases (some can be
> seen added by the patch to foreign_key.sql), but couldn't see the
> issue alluded to in the above comment. So I've marked the comment
> with an XXX note as follows:
>
> - * Note that we need triggers on foreign tables to be fired in exactly the
> - * order they were queued, so that the tuples come out of the tuplestore in
> - * the right order. To ensure that, we forbid deferrable (constraint)
> - * triggers on foreign tables. This also ensures that such triggers do not
> - * get deferred into outer trigger query levels, meaning that it's okay to
> - * destroy the tuplestore at the end of the query level.
> + * Note that we need triggers on foreign and partitioned tables to be fired in
> + * exactly the order they were queued, so that the tuples come out of the
> + * tuplestore in the right order. To ensure that, we forbid deferrable
> + * (constraint) triggers on foreign tables. This also ensures that such
> + * triggers do not get deferred into outer trigger query levels, meaning that
> + * it's okay to destroy the tuplestore at the end of the query level.
> + * XXX - update this paragraph if the new approach, whereby tuplestores in
> + * afterTriggers.deferred_tuplestores outlive any given query, can be proven
> + * to not really break any assumptions mentioned here.
>
> If anyone reading this can think of the issue the original comment
> seems to be talking about, please let me know.

I brought this up in an off-list discussion with Robert and he asked
why I hadn't considered not using tuplestores to remember the tuples
in the first place, specifically pointing out that it may lead to
unnecessarily consuming a lot of memory when such updates move a bunch
of rows around. Like, we could simply remember the tuples to be
passed to the trigger function using their CTIDs as is done for normal
(single-heap-relation) updates, though in this case also remember the
OIDs of the source and the destination partitions of a particular
cross-partition update.

I had considered that idea before but I think I had overestimated the
complexity of that approach so didn't go with it. I tried and the
resulting patch doesn't look all that complicated, with the added
bonus that the bulk update case shouldn't consume so much memory with
this approach like the previous tuplestore version would have.

Updated patches attached.

--
Amit Langote
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v10-0001-Create-foreign-key-triggers-in-partitioned-table.patch application/octet-stream 42.7 KB
v10-0002-Enforce-foreign-key-correctly-during-cross-parti.patch application/octet-stream 57.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2021-10-14 09:28:55 Re: Reset snapshot export state on the transaction abort
Previous Message Greg Nancarrow 2021-10-14 08:45:33 Re: Skipping logical replication transactions on subscriber side