From: | Andres Freund <andres(at)anarazel(dot)de> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Étienne BERSAC <etienne(dot)bersac(at)dalibo(dot)com>, jtc331(at)gmail(dot)com, ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, rafaelthca(at)gmail(dot)com, jian(dot)universality(at)gmail(dot)com |
Subject: | Re: RFC: Logging plan of the running query |
Date: | 2024-02-15 18:59:11 |
Message-ID: | 20240215185911.v4o6fo444md6a3w7@awork3.anarazel.de |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
On 2024-02-15 14:42:11 +0530, Robert Haas wrote:
> I think the issue is very general. We have lots of subsystems that
> both (a) use global variables and (b) contain CHECK_FOR_INTERRUPTS().
> If we process an interrupt while that code is in the middle of
> manipulating its global variables and then again re-enter that code,
> bad things might happen. We could try to rule that out by analyzing
> all such subsystems and all places where CHECK_FOR_INTERRUPTS() may
> appear, but that's very difficult. Suppose we took the alternative
> approach recommended by Andrey Lepikhov in
> http://postgr.es/m/b1b110ae-61f6-4fd9-9b94-f967db9b53d4@app.fastmail.com
> and instead set a flag that gets handled in some suitable place in the
> executor code, like ExecProcNode(). If we did that, then we would know
> that we're not in the middle of a syscache lookup, a catcache lookup,
> a heavyweight lock acquisition, an ereport, or any other low-level
> subsystem call that might create problems for the EXPLAIN code.
>
> In that design, the hack shown above would go away, and we could be
> much more certain that we don't need any other similar hacks for other
> subsystems. The only downside is that we might experience a slightly
> longer delay before a requested EXPLAIN plan actually shows up, but
> that seems like a pretty small price to pay for being able to reason
> about the behavior of the system.
I am very wary of adding overhead to ExecProcNode() - I'm quite sure that
adding code there would trigger visible overhead for query times.
If we went with something like tht approach, I think we'd have to do something
like redirecting node->ExecProcNode to a wrapper, presumably from within a
CFI. That wrapper could then implement the explain support, without slowing
down the normal execution path.
> I don't *think* there are any cases where we run in the executor for a
> particularly long time without a new call to ExecProcNode().
I guess it depends on what you call a long time. A large sort, for example,
could spend a fair amount of time inside tuplesort, similarly, a gather node
might need to wait for a worker for a while etc.
> It's really hard for me to accept that the heavyweight lock problem
> for which the patch contains a workaround is the only one that exists.
> I can't see any reason why that should be true.
I suspect you're right.
Greetings,
Andres Freund
From | Date | Subject | |
---|---|---|---|
Next Message | Maiquel Grassi | 2024-02-15 19:47:05 | RE: Psql meta-command conninfo+ |
Previous Message | Corey Huinker | 2024-02-15 18:41:57 | Re: Document efficient self-joins / UPDATE LIMIT techniques. |