Re: Query on pg_stat_activity table got stuck

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: neeraj kumar <neeru(dot)cse(at)gmail(dot)com>
Cc: Jeremy Schneider <schnjere(at)amazon(dot)com>, pgsql-admin(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org
Subject: Re: Query on pg_stat_activity table got stuck
Date: 2019-05-11 02:57:20
Message-ID: 5979.1557543440@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-general

neeraj kumar <neeru(dot)cse(at)gmail(dot)com> writes:
> --> But the right answer is to fix it on the writing side.

> Yes I agree with this part. Even though there is very low probability, a
> process can still be killed in middle when writing. So what is your
> suggestion on how to recover from this automatically?

Here's a draft patch. This makes more cosmetic changes than are strictly
necessary to fix the bug, but I'm concerned about preventing future
mistakes of the same sort, not just dealing with the immediate bug.

Anyway the idea here is (1) reduce the size of the critical section in
pgstat_bestart where the changecount is odd, so that nothing except
data copying happens there, and (2) adjust the macros for writing
st_changecount so that those sections actually are critical sections.
That means that if something throws an error inside those sections,
it'll turn into a PANIC and database restart. That shouldn't ever happen,
of course, but if it does then a PANIC is better than a frozen system.
I renamed the macros for writing st_changecount to make it more apparent
that those are now critical-section boundaries.

The out-of-line string fields make the pgstat_bestart code more ticklish
than one could wish, since they have to be treated differently from
in-line fields. But I'm not sure there's much we can do to avoid that.
It doesn't seem like a good idea to change the layout of the shared-
memory structure, at least not in released branches.

I found two other embarrassing bugs while I was at it. The stanza
initializing st_backendType was clearly inserted with the aid of a
dartboard, because it was actually before the initial st_changecount bump.
That probably has little if any real-world impact, but it's still an
indication of sloppy patching. And pgstat_read_current_status had not
been taught to copy the st_gssstatus out-of-line structure to local
storage, so that those values might fail to hold still while a
transaction examines the pg_stat_activity data. That *is* a live bug.

This patch is against HEAD --- I've not looked at how much adjustment
it'll need for the back branches, but I'm sure there's some.

regards, tom lane

Attachment Content-Type Size
pg-stat-activity-fixes-1.patch text/x-diff 22.8 KB

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2019-05-12 01:36:47 Re: Query on pg_stat_activity table got stuck
Previous Message neeraj kumar 2019-05-10 20:32:54 Re: Query on pg_stat_activity table got stuck

Browse pgsql-general by date

  From Date Subject
Next Message Glen Warnes 2019-05-11 03:13:17 pgAdmin4 help system non-functional after install
Previous Message Prakash Ramakrishnan 2019-05-11 01:49:23 Re: perl path issue