Re: Query on pg_stat_activity table got stuck

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: neeraj kumar <neeru(dot)cse(at)gmail(dot)com>
Cc: Jeremy Schneider <schnjere(at)amazon(dot)com>, pgsql-admin(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: Query on pg_stat_activity table got stuck
Date: 2019-05-10 04:29:18
Message-ID: 2586.1557462558@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

neeraj kumar <neeru(dot)cse(at)gmail(dot)com> writes:
> Tom, may be I didn't make my point clear.
> There are two issues :
> 1) Why this value was left as odd

Because a function called by pgstat_bestart threw an error, is what
I'm guessing.

> 2) Why backend entry is still pending in beentry for backend process even
> after it was killed/terminated.
> I am talking about 2nd issue. My understanding is query on pg_stat_activity
> goes via all backend entries via beentry and it finds this wrong/corrupted
> entry. When a process terminates, ideally this backend entry into beentery
> should have also been cleaned. But why this still there? Whose
> responsibility it is to remove entry from beentry when process terminates ?

If you're imagining that something takes an electronic hacksaw to shared
memory and physically removes that array entry, you're mistaken. It's
still going to be there. Now, there *is* cleanup code --- this bit of
pgstat_beshutdown_hook is supposed to mark the process's entry as
not-in-use:

/*
* Clear my status entry, following the protocol of bumping st_changecount
* before and after. We use a volatile pointer here to ensure the
* compiler doesn't try to get cute.
*/
pgstat_increment_changecount_before(beentry);

beentry->st_procpid = 0; /* mark invalid */

pgstat_increment_changecount_after(beentry);

However, if something had left st_changecount odd before we got to this,
it'd still be odd afterwards. Sure, st_procpid is now zero, but that
doesn't help readers because they're not supposed to believe st_procpid
is valid unless the changecount is even.

You could maybe argue that this cleanup code should be trying to ensure
that the changecount is left even, but I don't think that's an appropriate
response. If it's not even when we get here, we already screwed up very
badly, because we were breaking the protocol for (potentially) as long as
the process has been in existence. Moreover, if you're worried about
corner cases where we did mess that up, what of corner cases where process
exit fails before getting here? I think the right answer is to bring the
hammer down as soon as we mess up, which is what the critical-section
mechanism is for.

regards, tom lane

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message neeraj kumar 2019-05-10 15:01:25 Re: Query on pg_stat_activity table got stuck
Previous Message neeraj kumar 2019-05-10 03:34:37 Re: Query on pg_stat_activity table got stuck

Browse pgsql-general by date

  From Date Subject
Next Message Prakash Ramakrishnan 2019-05-10 09:24:39 Re: perl path issue
Previous Message neeraj kumar 2019-05-10 03:34:37 Re: Query on pg_stat_activity table got stuck