From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Julien Rouhaud <rjuju123(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz> |
Cc: | Sergei Kornilov <sk(at)zsrv(dot)org>, Guillaume Lelarge <guillaume(at)lelarge(dot)info>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Expose lock group leader pid in pg_stat_activity |
Date: | 2020-03-16 04:27:52 |
Message-ID: | 20200316042752.GE26184@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jan 28, 2020 at 12:36:41PM +0100, Julien Rouhaud wrote:
> So, AFAICT the LockHashPartitionLockByProc is required when
> iterating/modifying lockGroupMembers or lockGroupLink, but just
> getting the leader pid should be safe.
This still seems unsafe:
git show -U11 -w --patience b025f32e0b src/backend/utils/adt/pgstatfuncs.c
+ /*
+ * If a PGPROC entry was retrieved, display wait events and lock
+ * group leader information if any. To avoid extra overhead, no
+ * extra lock is being held, so there is no guarantee of
+ * consistency across multiple rows.
+ */
...
+ PGPROC *leader;
...
+ leader = proc->lockGroupLeader;
+ if (leader)
+ {
# does something guarantee that leader doesn't change ?
+ values[29] = Int32GetDatum(leader->pid);
+ nulls[29] = false;
}
Michael seems to have raised the issue:
On Thu, Jan 16, 2020 at 04:49:12PM +0900, Michael Paquier wrote:
> And actually, the way you are looking at the leader's PID is visibly
> incorrect and inconsistent because the patch takes no shared LWLock on
> the leader using LockHashPartitionLockByProc() followed by
> LWLockAcquire(), no? That's incorrect because it could be perfectly
> possible to crash with this code between the moment you check if
> lockGroupLeader is NULL and the moment you look at
> lockGroupLeader->pid if a process is being stopped in-between and
> removes itself from a lock group in ProcKill().
But I don't see how it was addressed ?
I read this:
src/backend/storage/lmgr/lock.c: * completely valid. We cannot safely dereference another backend's
src/backend/storage/lmgr/lock.c- * lockGroupLeader field without holding all lock partition locks, and
src/backend/storage/lmgr/lock.c- * it's not worth that.)
I think if you do:
|LWLockAcquire(&proc->backendLock, LW_SHARED);
..then it's at least *safe* to access leader->pid, but it may be inconsistent
unless you also call LockHashPartitionLockByProc.
I wasn't able to produce a crash, so maybe I missed something.
--
Justin
From | Date | Subject | |
---|---|---|---|
Next Message | Justin Pryzby | 2020-03-16 04:34:25 | Re: Berserk Autovacuum (let's save next Mandrill) |
Previous Message | Thomas Munro | 2020-03-16 04:26:48 | Re: effective_io_concurrency's steampunk spindle maths |