Re: Query on pg_stat_activity table got stuck

From: neeraj kumar <neeru(dot)cse(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Schneider <schnjere(at)amazon(dot)com>, pgsql-admin(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: Query on pg_stat_activity table got stuck
Date: 2019-05-09 23:35:37
Message-ID: CAPR3Wj4af3Lpu61DUdB8eri3TQhHLhcKY-q2bTRVLKAmj8h1Ug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

We got more information about this issue. There is one backend process
still present into beentry which has changecount as odd value. However this
process is long gone/terminated. It means when this process was
killed/terminated its entry was not cleaned from beentry list. There seems
to be some shutdown hook which cleans beentry if process is
killed/terminated that somehow was not kicked off ?

These are some of the fields of corrupted beentry entry which is still
hanging :

st_changecount = 1407, st_procpid = 0, st_backendType = B_BACKEND,
st_proc_start_timestamp = 610236763633421, st_xact_start_timestamp =
0, st_clienthostname = 0x9000023d480 "", st_ssl = 1 '\001',
st_sslstatus = 0x90000c60f80, st_state =
STATE_IDLEINTRANSACTION_ABORTED,

On Thu, May 9, 2019 at 1:00 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Jeremy Schneider <schnjere(at)amazon(dot)com> writes:
> > Seems to me that at a minimum, this loop shouldn't go on forever. Even
> > having an arbitrary, crazy high, hard-coded number of attempts before
> > failure (like a million) would be better than spinning on the CPU
> > forever - which is what we are seeing.
>
> I don't think it's the readers' fault. The problem is that the
> writer is violating the protocol. If we put an upper limit on
> the number of spin cycles on the reader side, we'll just be creating
> a new failure mode when a writer gets swapped out at the wrong moment.
>
> IMO we need to (a) get the failure-prone code out of the critical
> section, and then (b) fix the pgstat_increment_changecount macros
> so that the critical sections around these shmem changes really are
> critical sections (ie bump CritSectionCount). That way, if somebody
> makes the same mistake again, at least there'll be a pretty obvious
> failure rather than a lot of stuck readers.
>
> regards, tom lane
>

--
-------------------------------------
Thanks
Neeraj Kumar,
+1 (206) 427-7267

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2019-05-10 03:25:01 Re: Query on pg_stat_activity table got stuck
Previous Message Tom Lane 2019-05-09 20:00:52 Re: Query on pg_stat_activity table got stuck

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2019-05-10 01:20:36 Re: Oracle Migration Approach (Open source vs Vendor Specific)
Previous Message Bruno Lavoie 2019-05-09 23:10:49 Instructions to build from source RPMs