Re: Add CREATE support to event triggers

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add CREATE support to event triggers
Date: 2014-01-03 14:05:22
Message-ID: 20140103140522.GC7035@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas escribió:

> The other thing that bothers me here is that, while a normalized
> command string sounds great in theory, as soon as you want to allow
> (for example) mapping schema A on node 1 to schema B on node 2, the
> wheels come off: you'll have to deparse that normalized command string
> so you can change out the schema name and then reassemble it back into
> a command string again. So we're going to parse the user input, then
> deparse it, hand over the results to the application code, which will
> then parse it, modify that, and deparse it again.

I have considered several ideas on this front, but most of them turn out
to be useless or too cumbersome to use. What seems most adequate is to
build a command string containing certain patterns, and an array of
replacement values for such patterns; each pattern corresponds to one
element that somebody might want modified in the command. As a trivial
example, a command such as

CREATE TABLE foo (bar INTEGER);

would return a string like
CREATE TABLE ${table_schema}.${table_name} (bar INTEGER);

and the replacement array would be
{table_schema => "public", table_name => "foo"}

If we additionally provide a function to expand the replacements in the
string, we would have the base funcionality of a normalized command
string. If somebody wants to move the table to some other schema, they
can simply modify the array to suit their taste, and again expand using
the provided function; this doesn't require parsing SQL. It's likely
that there are lots of fine details that need exploring before this is a
fully workable idea -- I have just started work on it, so please bear
with me.

I think this is basically what you call "a JSON blob".

> Finally, I'm very skeptical of the word "normalized". To me, that
> sounds like an alias for "modifying the command string in unspecified
> ways that big brother thinks will be useful to event trigger authors".
> Color me skeptical. What if somebody doesn't want their command
> string normalized? What if they want it normalized in a way that's
> different from the way that we've chosen to normalize it? I fear that
> this whole direction amounts to "we don't know how to design a real
> API so let's just do surgery on the command string and call whatever
> pops out the API".

You might criticize the example above by saying that I haven't
considered using a JSON array for the list of table elements; in a
sense, I would be being Big Brother and deciding that you (as the user)
don't need to mess up with the column/constraints list in a table you're
creating. I thought about it and wasn't sure if there was a need to
implement that bit in the first iteration of this implementation. One
neat thing about this string+replaceables idea is that we can later
change what replaceable elements the string has, thus providing more
functionality (thus, for example, perhaps the column list can be altered
in v2 that was a "constant" in v1), without breaking existing users of
the v1.

> > but there
> > is a slight problem for some kind of objects that are represented partly
> > as ALTER state during creation; for example creating a table with a
> > sequence uses ALTER SEQ/OWNED BY internally at some point. There might
> > be other cases I'm missing, also. (The REFRESH command is nominally
> > also supported.)
>
> There are lots of places in the DDL code where we pass around
> constructed parse trees as a substitute for real argument lists. I
> expect that many of those places will eventually get refactored away,
> so it's important that this feature does not end up relying on
> accidents of the current code structure. For example, an
> AlterTableStmt can actually do a whole bunch of different things in a
> single statement: SOME of those are handled by a loop in
> ProcessUtilitySlow() and OTHERS are handled internally by AlterTable.
> I'm pretty well convinced that that division of labor is a bad design,
> and I think it's important that this feature doesn't make that dubious
> design decision into documented behavior.

Yeah, the submitted patch took care of these elements by invoking the
appropriate collection function at all the right places. Most of it
happened right in ProcessUtilitySlow, but other bits were elsewhere (for
instance, sub-objects created in a complex CREATE SCHEMA command). I
mentioned the ALTER SEQUENCE example above because that happens in a
code path that wasn't even close to the rest of the stuff.

> > Now about the questions I mentioned above:
> >
> > a) It doesn't work to reverse-parse the statement nodes in all cases;
> > there are several unfixable bugs if we only do that. In order to create
> > always-correct statements, we need access to the catalogs for the
> > created objects. But if we are doing catalog access, then it seems to
> > me that we can do away with the statement parse nodes completely and
> > just reconstruct the objects from catalog information. Shall we go that
> > route?
>
> That works well for CREATE and is definitely appealing in some ways;
> it probably means needing a whole lot of what's in pg_dump in the
> backend also. Of course, converting the pg_dump code to a library
> that can be linked into either a client or the server would make a lot
> of people happy. Making pg_dump much dumber (at least as regards
> future versions) and relying on new backend code to serve the same
> purpose would perhaps be reasonable as well, although I know Tom is
> against it. But having two copies of that code doesn't sound very
> good; and we'd need some way of thoroughly testing the one living in
> the backend.

Agreed on requiring thorough testing.

> > c) The current patch stashes all objects in a list, whenever there's an
> > event trigger function. But perhaps some users want to have event
> > triggers and not have any use for the CREATE statements. So one idea is
> > to require users that want the objects reported to call a special
> > function in a ddl_command_start event trigger which enables collection;
> > if it's not called, objects are not reported. This eliminates
> > performance impact for people not using it, but then maybe it will be
> > surprising for people that call the SRF and find that it always returns
> > empty.
>
> This seems like premature optimization to me, but I think I lost the
> last iteration of this argument.

I don't think you did; we ended up using a design that both doesn't
require explicit user action, nor causes a performance hit. I wasn't
sure about this, but thought I would mention just it in case. Maybe we
will end up doing the same here and have a create_sql_objects event or
something like that --- not real sure of this part yet, as there's a lot
of infrastructure to design and code first.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-01-03 14:11:13 Re: truncating pg_multixact/members
Previous Message Claudio Freire 2014-01-03 13:36:42 Re: RFC: Async query processing