From: | neeraj kumar <neeru(dot)cse(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Jeremy Schneider <schnjere(at)amazon(dot)com>, pgsql-admin(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Query on pg_stat_activity table got stuck |
Date: | 2019-05-09 23:35:37 |
Message-ID: | CAPR3Wj4af3Lpu61DUdB8eri3TQhHLhcKY-q2bTRVLKAmj8h1Ug@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin pgsql-general |
We got more information about this issue. There is one backend process
still present into beentry which has changecount as odd value. However this
process is long gone/terminated. It means when this process was
killed/terminated its entry was not cleaned from beentry list. There seems
to be some shutdown hook which cleans beentry if process is
killed/terminated that somehow was not kicked off ?
These are some of the fields of corrupted beentry entry which is still
hanging :
st_changecount = 1407, st_procpid = 0, st_backendType = B_BACKEND,
st_proc_start_timestamp = 610236763633421, st_xact_start_timestamp =
0, st_clienthostname = 0x9000023d480 "", st_ssl = 1 '\001',
st_sslstatus = 0x90000c60f80, st_state =
STATE_IDLEINTRANSACTION_ABORTED,
On Thu, May 9, 2019 at 1:00 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jeremy Schneider <schnjere(at)amazon(dot)com> writes:
> > Seems to me that at a minimum, this loop shouldn't go on forever. Even
> > having an arbitrary, crazy high, hard-coded number of attempts before
> > failure (like a million) would be better than spinning on the CPU
> > forever - which is what we are seeing.
>
> I don't think it's the readers' fault. The problem is that the
> writer is violating the protocol. If we put an upper limit on
> the number of spin cycles on the reader side, we'll just be creating
> a new failure mode when a writer gets swapped out at the wrong moment.
>
> IMO we need to (a) get the failure-prone code out of the critical
> section, and then (b) fix the pgstat_increment_changecount macros
> so that the critical sections around these shmem changes really are
> critical sections (ie bump CritSectionCount). That way, if somebody
> makes the same mistake again, at least there'll be a pretty obvious
> failure rather than a lot of stuck readers.
>
> regards, tom lane
>
--
-------------------------------------
Thanks
Neeraj Kumar,
+1 (206) 427-7267
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-05-10 03:25:01 | Re: Query on pg_stat_activity table got stuck |
Previous Message | Tom Lane | 2019-05-09 20:00:52 | Re: Query on pg_stat_activity table got stuck |
From | Date | Subject | |
---|---|---|---|
Next Message | Bruce Momjian | 2019-05-10 01:20:36 | Re: Oracle Migration Approach (Open source vs Vendor Specific) |
Previous Message | Bruno Lavoie | 2019-05-09 23:10:49 | Instructions to build from source RPMs |