Re: Query on pg_stat_activity table got stuck

From: neeraj kumar <neeru(dot)cse(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Schneider <schnjere(at)amazon(dot)com>, pgsql-admin(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: Query on pg_stat_activity table got stuck
Date: 2019-05-10 15:40:44
Message-ID: CAPR3Wj6E11q8L6YsXLdV_Q5qXXuDsgge_EnLzG_X6m0AX2Sp4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

Also curious why query on pg_stat_activity is considering terminated
process ? Irrespective of corrupted state or not, ideally query on
pg_stat_activity
should ignore already terminated process.
My 2 cents.

On Fri, May 10, 2019 at 8:01 AM neeraj kumar <neeru(dot)cse(at)gmail(dot)com> wrote:

> There are multiple ways see this problem. One way I am seeing is : how
> system will auto-recover from this particular state.
>
> So ideally if st_procpid is set to zero it means this process is already
> terminated, however it might be have left some corrupted information in
> memory. So when other components tries to read beentry, they should also
> check if st_procpid is already set to zero, if yes it means this process
> is gone and no need to consider this process any more.
>
> Agree this is solving particular issue about pg_stat_activity however
> don't see any harm in adding that check.
>
>
> On Thu, May 9, 2019 at 9:29 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> neeraj kumar <neeru(dot)cse(at)gmail(dot)com> writes:
>> > Tom, may be I didn't make my point clear.
>> > There are two issues :
>> > 1) Why this value was left as odd
>>
>> Because a function called by pgstat_bestart threw an error, is what
>> I'm guessing.
>>
>> > 2) Why backend entry is still pending in beentry for backend process
>> even
>> > after it was killed/terminated.
>> > I am talking about 2nd issue. My understanding is query on
>> pg_stat_activity
>> > goes via all backend entries via beentry and it finds this
>> wrong/corrupted
>> > entry. When a process terminates, ideally this backend entry into
>> beentery
>> > should have also been cleaned. But why this still there? Whose
>> > responsibility it is to remove entry from beentry when process
>> terminates ?
>>
>> If you're imagining that something takes an electronic hacksaw to shared
>> memory and physically removes that array entry, you're mistaken. It's
>> still going to be there. Now, there *is* cleanup code --- this bit of
>> pgstat_beshutdown_hook is supposed to mark the process's entry as
>> not-in-use:
>>
>> /*
>> * Clear my status entry, following the protocol of bumping
>> st_changecount
>> * before and after. We use a volatile pointer here to ensure the
>> * compiler doesn't try to get cute.
>> */
>> pgstat_increment_changecount_before(beentry);
>>
>> beentry->st_procpid = 0; /* mark invalid */
>>
>> pgstat_increment_changecount_after(beentry);
>>
>> However, if something had left st_changecount odd before we got to this,
>> it'd still be odd afterwards. Sure, st_procpid is now zero, but that
>> doesn't help readers because they're not supposed to believe st_procpid
>> is valid unless the changecount is even.
>>
>> You could maybe argue that this cleanup code should be trying to ensure
>> that the changecount is left even, but I don't think that's an appropriate
>> response. If it's not even when we get here, we already screwed up very
>> badly, because we were breaking the protocol for (potentially) as long as
>> the process has been in existence. Moreover, if you're worried about
>> corner cases where we did mess that up, what of corner cases where process
>> exit fails before getting here? I think the right answer is to bring the
>> hammer down as soon as we mess up, which is what the critical-section
>> mechanism is for.
>>
>> regards, tom lane
>>
>
>
> --
> -------------------------------------
> Thanks
> Neeraj Kumar,
> +1 (206) 427-7267
>

--
-------------------------------------
Thanks
Neeraj Kumar,
+1 (206) 427-7267

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2019-05-10 16:13:58 Re: Query on pg_stat_activity table got stuck
Previous Message neeraj kumar 2019-05-10 15:01:25 Re: Query on pg_stat_activity table got stuck

Browse pgsql-general by date

  From Date Subject
Next Message bb ddd 2019-05-10 15:56:41 Re: Question about Expected rows value in EXPLAIN output for Nested Loop node
Previous Message Vijaykumar Jain 2019-05-10 15:29:52 logical replication initiate via manual pg_dump