From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Rafael Thofehrn Castro <rafaelthca(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Proposal: Progressive explain |
Date: | 2025-03-20 20:06:59 |
Message-ID: | CA+TgmoZw+xy3WyrPD4X5ChP+=kJ4ncRZY8xQu0XfChDa4fa7yg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 19, 2025 at 6:53 PM Rafael Thofehrn Castro
<rafaelthca(at)gmail(dot)com> wrote:
> The strategy I used here is to use a MemoryContextCallback
> (ProgressiveExplainReleaseFunc), configured in the memory context
> where the query is being executed, being responsible for calling
> ProgressiveExplainCleanup() if the query doesn't end gracefully.
Thanks for the pointer. I'm a bit skeptical about what's going on here
in ProgressiveExplainReleaseFunc(). It seems like we're depending on
shared memory to tell us whether we need to do purely backend-local
cleanup, like calling disable_timeout() and resetting
ProgressiveExplainPending and activeQueryDesc. I would have expected
us to keep track in local memory of whether this kind of work needs to
be done. It seems roundabout to take an LWLock, do a hash table lookup
to see if there's an entry there, release the LWLock, and then very
shortly thereafter take the lock a second time to release the entry
that we now know is there.
The comment in ProgressiveExplainCleanup about only detaching from the
DSA if the query ended gracefully is not ideal from my point of view
because it says what the code does instead of why the code does that
thing. Also, the function is seemingly called with queryDesc as an
argument not because you need it for anything but because you're going
to test whether it's null. In that case, you could just pass a
Boolean. Even then, something seems odd about this: why do we have to
be holding ProgressiveExplainHashLock to dsa_detach the
somewhat-inscrutably named area pe_a? And why are we not detaching it
in case of error?
I am wondering why you chose this relatively unusual error cleanup
strategy. What I would have done is invent AtEOXact_ProgressiveExplain
and AtSubEOXact_ProgressiveExplain. In some sense this looks simpler,
because it doesn't need separate handling for transactions and
subtransactions, but it's so unlike what we do in most places that
it's hard for me to tell whether it's correct. I feel like the
approach you've chosen here would make sense if what we wanted to do
was basically release memory or some memory-like resource associated
closely with the context -- e.g. expandedrecord.c releases a
TupleDesc, but this is doing more than that.
I think the effect of this choice is that cleanup of the
progressive-EXPLAIN stuff happens much later than it normally would.
Most of the time, in the case of an abort, we would want
AbortTransaction() to release as many resources as possible, leaving
basically no work to do at CleanupTransaction() time. This is so that
if a user types "SELECT 1/0;" we release resources, as far as
possible, right away, and don't wait for them to subsequently type
"ROLLBACK;". The transaction lives on only as a shell. But these
resources, if I'm reading this right, are going to stick around until
the transaction is actually rolled back, because memory is not freed
until CleanupTransaction() time. I wonder what happens if a query
inside of an explicit transaction aborts after putting an entry in the
progressive-explain view. My guess is that the entry will stick around
until the actual rollback happens.
In fact, now that I think about it, this is probably why we don't
dsa_detach() in ProgressiveExplainCleanup() in cases of error -- the
resource owner cleanup will have already released the DSA segments
long before the memory context is deleted.
I'm sort of inclined to say that this should be rewritten to do error
cleanup in a more normal way. It's probably more code to do it that
way, but I think having it be more similar to other subsystems is
probably worth quite a bit.
> > It seems like when we replace a longer entry with a shorter one, we
> > forget that it was originally longer. Hence, if the length of a
> > progressive EXPLAIN is alternately 2922 characters and 2923
> > characters, we'll reallocate on every other progressive EXPLAIN
> > instead of at most once.
>
> Are you talking about re-printing the plan in the same query execution?
Yes.
> The logic for the code, using your example, would be to allocate 2922 +
> PROGRESSIVE_EXPLAIN_FREE_SIZE (4096, currently) initially. If next plans
> alternate between 2922 and 2923 no additional allocation will be done.
> A reallocation will be needed only if the plan length ends up exceeding
> 2922+4096. At the end of query execution (or cancellation) that DSA will
> be freed and a next query execution will have to allocate again using the
> same logic.
It seems to me that if ProgressiveExplainPrint() reaches /* Update
shared memory with new data */ without reallocating,
strlen(pe_data->plan) can be reduced. On the next trip through the
function, we don't know whether the string we're seeing is the
original string -- for which strlen()+PROGRESSIVE_EXPLAIN_FREE_SIZE)
gives us the original allocation size -- or whether the string we're
seeing is a shorter one that was copied over the original, longer
string. PROGRESSIVE_EXPLAIN_FREE_SIZE is big enough that this probably
isn't much of a problem in practice, because consecutive EXPLAIN
outputs for the same query probably won't vary in size by enough to
cause any reallocation. Looking at this more carefully, I think that
the query plan would have to shrink in size by >4kB and then expand
again in order to trigger reallocation, which seems unlikely. But it
still seems unclean to me. Normally, we track how much space we have
actually allocated explicitly, instead of reconstructing that number
from something else, especially something that isn't guaranteed to
produce an accurate result. I think you should just store the number
of available payload bytes at the beginning of the chunk and then
reallocate if it isn't big enough to hold the payload that you have
got.
> Regarding the execProcNode wrapper strategy. It used it precisely because
> of the discussion in that other patch. I actually tried not using it here,
> and call ProgressiveExplainPrint() in the timeout callback. This resulted
> in sporadic crashes, confirming the suspicion that it is not a good
> idea.
Makes sense, but we need adequate documentation of what we did and why
it works (or at least why we think it works).
Another thing I just noticed is that pg_stat_progress_explain() uses
beentry->st_procpid == ped->pid as the permissions check, but a more
typical test is HAS_PGSTAT_PERMISSIONS(beentry->st_userid). I know
that's only in pgstatfuncs.c, but I think it would be OK to duplicate
the associated test here (i.e. has_privs_of_role(GetUserId(),
ROLE_PG_READ_ALL_STATS) || has_privs_of_role(GetUserId(), role)). I
don't really see a reason why this shouldn't use the same permission
rules as other pg_stat_ things, in particular pg_stat_get_activity().
--
Robert Haas
EDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2025-03-20 20:10:32 | Re: Have postgres.bki self-identify |
Previous Message | Matheus Alcantara | 2025-03-20 19:54:08 | Re: dblink: Add SCRAM pass-through authentication |