Re: BUG #14808: V10-beta4, backend abort

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andrew Gierth <rhodiumtoad(at)postgresql(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>, Kevin Grittner <kgrittn(at)gmail(dot)com>
Subject: Re: BUG #14808: V10-beta4, backend abort
Date: 2017-09-15 00:12:35
Message-ID: CAEepm=2Fu=W7MFqsTcBGQEFwv05F6DWV0mv+vnPXReUcmeZ4Kg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Sep 15, 2017 at 9:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Attached is a draft patch for this.

Some initial feedback:

Compiles cleanly and make check-world passes here.

with wcte as (insert into table1 values (42))
insert into table2 values ('hello world');
NOTICE: trigger = table2_trig, new table = ("hello world")
NOTICE: trigger = table1_trig, new table = (42)
+with wcte as (insert into table1 values (43))
+ insert into table1 values (44);
+NOTICE: trigger = table1_trig, new table = (43), (44)

The effects of multiple ModifyTable nodes on the same table are
merged. Good. I doubt anyone will notice but it might warrant a note
somewhere that there is a user-visible change here: previously if you
did this (without TTs) your trigger would have fired twice.

+create trigger my_table_col_update_trig
+ after update of b on my_table referencing new table as new_table
+ for each statement execute procedure dump_insert();
+ERROR: transition tables cannot be specified for triggers with column lists

Potential SQL standard non-compliance avoided. Good.

delete from refd_table where length(b) = 3;
-NOTICE: trigger = trig_table_delete_trig, old table = (2,"two a"), (2,"two b")
-NOTICE: trigger = trig_table_delete_trig, old table = (11,"one a"),
(11,"one b")
+NOTICE: trigger = trig_table_delete_trig, old table = (2,"two a"),
(2,"two b"), (11,"one a"), (11,"one b")

The effects of fk cascade machinery are merged as discussed. Good, I
think, but I'm a bit confused about how this works when the cascading
operation also fires triggers.

All the other tests show no change in behaviour. Good.

What is going on here?

=== setup ===
create or replace function dump_delete() returns trigger language plpgsql as
$$
begin
raise notice 'trigger = %, old table = %, depth = %',
TG_NAME,
(select string_agg(old_table::text, ', ' order by a)
from old_table),
pg_trigger_depth();
return null;
end;
$$;
create table foo (a int primary key, b int references foo(a) on delete cascade);
create trigger foo_s_trig after delete on foo
referencing old table as old_table
for each statement
execute procedure dump_delete();
create trigger foo_r_trig after delete on foo
referencing old table as old_table
for each row
execute procedure dump_delete();
insert into foo values (1, null), (2, 1);
===8<===

postgres=# delete from foo where a = 1;
NOTICE: trigger = foo_r_trig, old table = (1,), depth = 1
NOTICE: trigger = foo_s_trig, old table = (1,), depth = 1
NOTICE: trigger = foo_r_trig, old table = (2,1), depth = 1
NOTICE: trigger = foo_s_trig, old table = (2,1), depth = 1
NOTICE: trigger = foo_s_trig, old table = <NULL>, depth = 1
DELETE 1

> Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com> writes:
>> 1. Merging the transition tables when there are multiple wCTEs
>> referencing the same table. Here's one idea: Rename
>> MakeTransitionCaptureState() to GetTransitionCaptureState() and use a
>> hash table keyed by table OID in
>> afterTriggers.transition_capture_states[afterTriggers.query_depth] to
>> find the TCS for the given TriggerDesc or create it if not found, so
>> that all wCTEs find the same TransitionCaptureState object. The all
>> existing callers continue to do what they're doing now, but they'll be
>> sharing TCSs appropriately with other things in the plan. Note that
>> TransitionCaptureState already holds tuplestores for each operation
>> (INSERT, UPDATE, DELETE) so the OID of the table alone is a suitable
>> key for the hash table (assuming we are ignoring the column-list part
>> of the spec as you suggested).
>
> It seems unsafe to merge the TCS objects themselves, because the callers
> assume that they can munge the tcs_map and tcs_original_insert_tuple
> fields freely without regard for any other callers. So as I have it,
> we still have a TCS for each caller, but the TCSes point at tuplestores
> that can be shared across multiple callers for the same event type.
> The tuplestores themselves are managed by the AfterTrigger data
> structures. Also, because the TCS structs are just throwaway per-caller
> data, it's uncool to reference them in the trigger event lists.
> So I replaced ats_transition_capture with two pointers to the actual
> tuplestores.

Ok, that works and yeah it may be conceptually better. Also, maybe
the tcs_original_insert_tuple as member of TCS is a bit clunky and
could be reconsidered later: I was trying to avoid widening a bunch of
function calls, but that might have been optimising for the wrong
thing...

> That bloats AfterTriggerSharedData a bit but I think it's
> okay; we don't expect a lot of those structs in a normal query.

Yeah, it was bloat avoidance that led me to find a way to use a single
pointer. But at least it's in the shared part, so I don't think it
matters too much.

> I chose to make the persistent state (AfterTriggersTableData) independent
> for each operation type. We could have done that differently perhaps, but
> it seemed more complicated and less likely to match the spec's semantics.

OK. Your GetAfterTriggersTableData(Oid relid, CmdType cmdType) is
mirroring the spec's way of describing what happens (without the
column list).

+ foreach(lc, qs->tables)
+ {
+ table = (AfterTriggersTableData *) lfirst(lc);
+ if (table->relid == relid && table->cmdType == cmdType &&
+ !table->closed)
+ return table;
+ }

Yeah, my suggestion of a hash table was overkill. (Maybe in future if
we change our rules around inheritance this could finish up being
searched for a lot of child tables; we can cross that bridge when we
come to it.)

I'm a little confused about the "closed" flag. This seems to imply
that it's possible for there to be more than one separate tuplestore
for a given (table, operation) in a given trigger execution context.
Why is that OK?

> The INSERT ON CONFLICT UPDATE mess is handled by creating two separate
> TCSes with two different underlying AfterTriggersTableData structs.
> The insertion tuplestore sees only the inserted tuples, the update
> tuplestores see only the updated-pre-existing tuples. That adds a little
> code to nodeModifyTable but it seems conceptually much cleaner.

OK. The resulting behaviour is unchanged.

>> 3. Merging the invocation after statement firing so that if you
>> updated the same table directly and also via a wCTE and also
>> indirectly via fk ON DELETE/UPDATE trigger then you still only get one
>> invocation of the after statement trigger. Not sure exactly how...
>
> What I did here was to use the AfterTriggersTableData structs to hold
> a flag saying we'd already queued statement triggers for this rel and
> cmdType. There's probably more than one way to do that, but this seemed
> convenient.

Seems good. That implements the following (from whatever random draft
spec I have): "A statement-level trigger that is considered as
executed for a state change SC (in a given trigger execution context)
is not subsequently executed for SC."

> One thing I don't like too much about that is that it means there are
> cases where the single statement trigger firing would occur before some
> AFTER ROW trigger firings. Not sure if we promise anything about the
> ordering in the docs. It looks quite expensive/complicated to try to
> make it always happen afterwards, though, and it might well be totally
> impossible if triggers cause more table updates to occur.

I suppose that it might be possible for AfterTriggersTableData to
record the location of a previously queued event so that you can later
disable it and queue a replacement, with the effect of suppressing
earlier firings rather than later ones. That might make sense if you
think that after statement triggers should fire after all row
triggers. I can't figure out from the spec whether that's expected
and I'm not sure if it's useful.

> Because MakeTransitionCaptureState now depends on the trigger query
> level being active, I had to relocate the AfterTriggerBeginQuery calls
> to occur before it.

Right.

> In passing, I refactored the AfterTriggers data structures a bit so
> that we don't need to do as many palloc calls to manage them. Instead
> of several independent arrays there's now one array of structs.

Good improvement.

--
Thomas Munro
http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2017-09-15 01:10:17 Re: BUG #14808: V10-beta4, backend abort
Previous Message Tom Lane 2017-09-14 21:45:55 Re: BUG #14808: V10-beta4, backend abort