Re: Query on pg_stat_activity table got stuck

From: neeraj kumar <neeru(dot)cse(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Schneider <schnjere(at)amazon(dot)com>, pgsql-admin(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: Query on pg_stat_activity table got stuck
Date: 2019-05-10 15:01:25
Message-ID: CAPR3Wj7gdw0BwwaeQ5sHbb_RmOeHddQbS2u8SCM=D5eBVRwFog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

There are multiple ways see this problem. One way I am seeing is : how
system will auto-recover from this particular state.

So ideally if st_procpid is set to zero it means this process is already
terminated, however it might be have left some corrupted information in
memory. So when other components tries to read beentry, they should also
check if st_procpid is already set to zero, if yes it means this process is
gone and no need to consider this process any more.

Agree this is solving particular issue about pg_stat_activity however don't
see any harm in adding that check.

On Thu, May 9, 2019 at 9:29 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> neeraj kumar <neeru(dot)cse(at)gmail(dot)com> writes:
> > Tom, may be I didn't make my point clear.
> > There are two issues :
> > 1) Why this value was left as odd
>
> Because a function called by pgstat_bestart threw an error, is what
> I'm guessing.
>
> > 2) Why backend entry is still pending in beentry for backend process even
> > after it was killed/terminated.
> > I am talking about 2nd issue. My understanding is query on
> pg_stat_activity
> > goes via all backend entries via beentry and it finds this
> wrong/corrupted
> > entry. When a process terminates, ideally this backend entry into
> beentery
> > should have also been cleaned. But why this still there? Whose
> > responsibility it is to remove entry from beentry when process
> terminates ?
>
> If you're imagining that something takes an electronic hacksaw to shared
> memory and physically removes that array entry, you're mistaken. It's
> still going to be there. Now, there *is* cleanup code --- this bit of
> pgstat_beshutdown_hook is supposed to mark the process's entry as
> not-in-use:
>
> /*
> * Clear my status entry, following the protocol of bumping
> st_changecount
> * before and after. We use a volatile pointer here to ensure the
> * compiler doesn't try to get cute.
> */
> pgstat_increment_changecount_before(beentry);
>
> beentry->st_procpid = 0; /* mark invalid */
>
> pgstat_increment_changecount_after(beentry);
>
> However, if something had left st_changecount odd before we got to this,
> it'd still be odd afterwards. Sure, st_procpid is now zero, but that
> doesn't help readers because they're not supposed to believe st_procpid
> is valid unless the changecount is even.
>
> You could maybe argue that this cleanup code should be trying to ensure
> that the changecount is left even, but I don't think that's an appropriate
> response. If it's not even when we get here, we already screwed up very
> badly, because we were breaking the protocol for (potentially) as long as
> the process has been in existence. Moreover, if you're worried about
> corner cases where we did mess that up, what of corner cases where process
> exit fails before getting here? I think the right answer is to bring the
> hammer down as soon as we mess up, which is what the critical-section
> mechanism is for.
>
> regards, tom lane
>

--
-------------------------------------
Thanks
Neeraj Kumar,
+1 (206) 427-7267

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message neeraj kumar 2019-05-10 15:40:44 Re: Query on pg_stat_activity table got stuck
Previous Message Tom Lane 2019-05-10 04:29:18 Re: Query on pg_stat_activity table got stuck

Browse pgsql-general by date

  From Date Subject
Next Message Vijaykumar Jain 2019-05-10 15:29:52 logical replication initiate via manual pg_dump
Previous Message Prakash Ramakrishnan 2019-05-10 14:50:00 Re: perl path issue