From: | Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Event Triggers: adding information |
Date: | 2013-01-18 09:18:56 |
Message-ID: | m2622u97cv.fsf@2ndQuadrant.fr |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I'm somewhat bemused by this comment. I don't think
>> CommandCounterIncrement() is an article of faith. If you execute a
>> command to completion, and do a CommandCounterIncrement(), then
>> whatever you do next will look just like a new command, so it should
>> be safe to run user-provided code there. But if you stop in the
Exactly my thinking.
>> middle of a command, do a CommandCounterIncrement(), run some
>> user-provided code, and then continue on, the
>> CommandCounterIncrement() by itself protects you from nothing. If the
Yes. That's why I've been careful not to add CommandCounterIncrement()
when adding the Event Trigger facilities. The only one we added is so
that the next trigger sees what the previous one just did. That's all.
So the patch is not adding CCI calls to pretend that an existing place
in the code is now safe, it's relying on CCI as existing protocol in the
implementation to denote existing places in the code where calling user
code should be safe, because a logical block of a command just has been
done and is finished.
>> code is not expecting arbitrary operations to be executed at that
>> point, and you execute them, you get to keep both pieces. Indeed,
>> there are some places in the code where inserting a
>> CommandCounterIncrement() *by itself* could be unsafe.
I agree, it's pretty obvious. And that's why I'm not adding any.
> I think it's quite likely that we should *not* expect that we can safely
> do CommandCounterIncrement at any random place. That isn't just "x++".
Agreed. That's what the patch is doing, relying on the ones that are
already in the code, and being very careful not to add any new one.
> It causes relcache and catcache flushes for anything affected by the
> command-so-far, since it's making the command's results visible --- and
> the code may not be expecting the results to become visible yet, and
> almost certainly isn't expecting cache entries to disappear under it.
> So a CCI could break things whether or not the event trigger itself did
> anything at all.
Yes.
> But this is just one part of the general problem that injecting random
> code mid-command is risky as hell, and isn't likely to become less so.
Yes.
That's why there's nothing in the proposed patch that does that.
The user code is run either before the command starts, the event is
called ddl_command_start, and in the case of a DROP command is has to do
an extra lookup only to guarantee that we're not trying to inject user
code in a random point in the execution of the existing code.
Or the user code is run after the command is done, when it has called
itself CommandCounterIncrement() and is ready to resume execution to its
caller, and that event is called ddl_command_end.
> Frankly I don't really wish to see that happen, and don't immediately
> see good end-user reasons (as opposed to structure-of-the-current-code
> reasons) to need it. I'd really like to get to a point where we can
> define things as happening like this:
>
> * collect information needed to interpret the DDL command
> (lookup and lock objects, etc)
> * fire "before" event triggers, if any (and if so, recheck info)
> * do the command
> * fire "after" event triggers, if any
I think that if you do the lookup and lock of objects before running the
command, then you also want to do it again after the event trigger has
run, because you just don't know what it did.
That's why I implemented that way:
* no information is given to ddl_command_start event triggers
* information is collected while the code runs normally
* the ddl_command_end event has the information and only uses it if the
object still exists
Now, the extra complexity is that we also want the information at
ddl_command_start for a DROP command, so there's an explicit extra
lookup here in *only that case* and *only when some event triggers are
registered* (which is checked from a cache lookup), and only when
there's a single object targeted by the command.
And when doing that extra lookup we take a ShareUpdateExclusiveLock on
the object so that someone else won't drop it before we get to run the
Event Trigger User Function.
> This might require that the command execution save aside information
> that would help us fire appropriate event triggers later. But I'd much
> rather pay that overhead than fire triggers mid-command just because
> that's where the info is currently available.
And that's what is implemented in my patch. The information that's kept
aside is the OID of the object being the target of the command.
Last year, in the patch that didn't make it in 9.2, I used to keep the
whole EventTriggerData structure around and fill it in (it had another
name in the 9.2 era patch) from all the commands/* implementations.
Robert didn't like that, at all, because of the code foot print IIRC.
So I've been very careful in that version of the patch not to implement
things exactly as you're saying, because what you're saying looks very
much like my first approach from last year, the one that crashed and
burnt.
For full disclosure though, IIRC, the patch from that era also did try
to call Event Trigger User Code at dangerous places in the middle of the
commands implementation. That was removed before the end of the commit
fests, though, but we then had to say no for 9.2.
> BTW, how badly do we need the "before" case at all? If we were to
> restrict our ambitions to "after" triggers, it would perhaps be
> relatively simple for DDL execution to create data structures noting
> what it does as it goes along, and then use those to fire "after"
> triggers at the end (and provide desired info to them too). Still
> a lot of new code of course, but not very much risk involved.
I personally need the ddl_command_start case for implementing some
CREATE EXTENSION features in user code. And I would need to also have
the extension's name from the parsetree at that point. In the current
state of the patch, that will mean that I need a little new function
coded in C and loaded in the backend to be able to get at it.
> [ reads further... ] Robert seems to have pretty much that same idea
> down at the end:
>
>> One idea would be to save off all the names of the objects actually
>> dropped at the time the drops are done, and then pass it to the event
>> trigger afterward.
We can go even further in simplicity.
Document that you only get information about objects that exists at the
time the event trigger user function is run. Then you don't have any
specific information for a DROP command at ddl_command_end, and you
don't need any code to make it happen. You also don't have the
information at ddl_command_start for a CREATE operation.
That happens to be what my patch implements, too.
I think that the refactoring of the DDL commands in several explicit
steps of parsing, analysis then execution is the way forward, and I
didn't see any way to provide that in 1 cycle (that was my first target,
remember), and even if I had then known that it would take 2 cycles for
the limited edition of the feature, I don't think I would have called
that path realistic. As you said, this refactoring looks more like a 3
cycles effort in itself, let alone building anything on top of it.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2013-01-18 09:20:57 | Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave |
Previous Message | Kohei KaiGai | 2013-01-18 09:12:57 | Re: Move postgresql_fdw_validator into dblink |