Re: delta relations in AFTER triggers

From: David Fetter <david(at)fetter(dot)org>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: delta relations in AFTER triggers
Date: 2014-06-19 02:49:00
Message-ID: 20140619024900.GE17042@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 18, 2014 at 03:30:34PM -0700, Kevin Grittner wrote:
> David Fetter <david(at)fetter(dot)org> wrote:
> > Robert Haas wrote:
> >> Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>
> > The good:
> >     - Generating the tuplestores.  Yay!
>
> Thanks for that.  ;-)

Sorry, I just can't resist references to Spaghetti Westerns.
https://en.wikipedia.org/wiki/The_Good,_the_Bad_and_the_Ugly

> > The bad:
> >     - Generating them exactly and only for AFTER triggers
>
> The standard only allows them for AFTER triggers, and I'm not sure
> what the semantics would be for any others.

As, so here's where we differ. You're looking at deltas, a very nice
capability to have. I'm looking at the before and after tuplestores
as components of which deltas, among many other things, could be
composed.

> >     - Requiring that the tuplestores both be generated or not at
> >       all.  There are real use cases described below where only
> >       one would be relevant.
>
> Yeah.
>
> >     - Generating the tuplestores unconditionally.
>
> Well, there are conditions.  Only when the reloption allows and
> only if there is an AFTER trigger for the type of operation in
> progress.

For deltas, this is just the thing.

I'm vaguely picturing the following as infrastructure:

- Instead of modifying Rel, we modify Query to contain two more bools
default false: hasBeforeTuplestore and hasAfterTuplestore
- Each use case we implement would set 0 or more of these to true.
For the delta use case, appropriate trigger definitions would set
both.

This is vague because I haven't really gotten hacking on it, just
exploring what I hope are the relevant parts of the code.

> > The ugly:
> >     - Attaching tuplestore generation to tables rather than
>         callers (triggers, DML, etc.)
>
> I'm not sure what you're getting at here.  This patch is
> specifically only concerned with generating delta relations for DML
> AFTER triggers, although my hope is that it will be a basis for
> delta relations used for other purposes.  This seems to me like the
> right place to initially capture the data for incremental
> maintenance of materialized views, and might be of value for other
> purposes, too.

Hrm. I don't really see this stuff as table properties. The
materialized view case is an obvious example where the matview, not
the relations underneath, wants this information. The relations
underneath may have their own concerns, but it's the matview whose
existence should ensure that the tuplestores are being generated.

Once the last depending-on-one-of-the-tuplestores things is gone, and
this could simply be the end of a RETURNING query, the tuplestores go
away.

> > [formal definition of standard CREATE TRIGGER statement]
>
> > Sorry that was a little verbose, but what it does do is give us
> > what we need at trigger definition time.  I'd say it's pilot
> > error if a trigger definition says "make these tuplestores" and
> > the trigger body then does nothing with them, which goes to
> > Robert's point below re: unconditional overhead.
>
> Yeah, the more I think about it (and discuss it) the more I'm
> inclined to suffer the additional complexity of the standard syntax
> for specifying transition relations in order to avoid unnecessary
> overhead creating them when not needed.  I'm also leaning toward
> just storing TIDs in the tuplestores, even though it requires mixed
> snapshots in executing queries in the triggers.

So in this case one tuplestore with two TIDs, either of which might be
NULL?

> just seems like there will otherwise be to much overhead in copying
> around big, unreferenced columns for some situations.

Yeah, it'd be nice to have the minimal part be as slim as possible.

> > Along that same line, we don't always need to capture both the
> > before tuplestores and the after ones.  Two examples of this come
> > to mind:
> >
> > - BEFORE STATEMENT triggers accessing rows, where there is no
> > after part to use,
>
> Are you talking about an UPDATE for which the AFTER trigger(s) only
> reference the before transition table, and don't look at AFTER?If
> so, using the standard syntax would cover that just fine.  If not,
> can you elaborate?

Sorry I was unclear. I was looking at one of the many things having
these tuplestores around could enable. As things stand now, there is
no access of any kind to rows with any per-statement trigger, modulo
user-space hacks like this one:

http://people.planetpostgresql.org/dfetter/index.php?/archives/71-Querying-Rows-in-Statement-Triggers.html

Having the "before" tuplestore available to a BEFORE STATEMENT trigger
would make it possible to do things with the before transition table
that are fragile and hacky now.

> > and
> > - DML (RETURNING BEFORE, e.g.) which only touches one of them.
> > This applies both to extant use cases of RETURNING and to planned
> > ones.
>
> I think that can be sorted out by a patch which implements that, if
> these deltas even turn out to be the appropriate way to get that
> data (which is not clear to me at this time).

Again, I see the tuplestores as infrastructure both deltas and many
other things, so long as they're attached to the right objects. In my
opinion, the right objects would include materialized views, triggers,
and certain very specific kinds of DML of which all the RETURNING ones
are one example. They would not include the underlying tables.

> standard
> syntax, the first thing would be for the statement to somehow
> communicate to the trigger layer the need to capture a tuplestore
> it might otherwise not generate, and there would need to be a way
> for the statement to access the needed tuplestore(s).

Right. Hence my proposal to make the existence of the tuplestores
part of Query, writeable by the types of triggers which specify that
they'll be needed.

> The statement would also need to project the right set of columns.
> None of that seems to me to be relevant to this patch.  If this
> patch turns out to provide infrastructure that helps, great.  If you
> have a specific suggestion about how to make the tuplestores more
> accessible to other layers, I'm listening.

See above :)

> > In summary, I'd like to propose that the tuplestores be generated
> > separately in general and attached to callers. We can optimize
> > this by not generating redundant tuplestores.
>
> Well, if we use the standard syntax for CREATE TRIGGER and store
> the transition table names (if any) in pg_trigger, the code can
> generate one relation if any AFTER triggers which are going to fire
> need it.  I don't see any point in generating exactly the same
> tuplestore contents for each trigger.  And suspect that we can wire
> in any other uses later when we have something to connect them to.

Yes. I just don't think that Rel is the place to connect them.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-06-19 02:58:48 Re: Possible index issue on 9.5 slave
Previous Message Tom Lane 2014-06-19 02:29:36 Re: [bug fix] Memory leak in dblink