From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | neeraj kumar <neeru(dot)cse(at)gmail(dot)com> |
Cc: | Jeremy Schneider <schnjere(at)amazon(dot)com>, pgsql-admin(at)postgresql(dot)org, pgsql-general(at)postgresql(dot)org |
Subject: | Re: Query on pg_stat_activity table got stuck |
Date: | 2019-05-11 02:57:20 |
Message-ID: | 5979.1557543440@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin pgsql-general |
neeraj kumar <neeru(dot)cse(at)gmail(dot)com> writes:
> --> But the right answer is to fix it on the writing side.
> Yes I agree with this part. Even though there is very low probability, a
> process can still be killed in middle when writing. So what is your
> suggestion on how to recover from this automatically?
Here's a draft patch. This makes more cosmetic changes than are strictly
necessary to fix the bug, but I'm concerned about preventing future
mistakes of the same sort, not just dealing with the immediate bug.
Anyway the idea here is (1) reduce the size of the critical section in
pgstat_bestart where the changecount is odd, so that nothing except
data copying happens there, and (2) adjust the macros for writing
st_changecount so that those sections actually are critical sections.
That means that if something throws an error inside those sections,
it'll turn into a PANIC and database restart. That shouldn't ever happen,
of course, but if it does then a PANIC is better than a frozen system.
I renamed the macros for writing st_changecount to make it more apparent
that those are now critical-section boundaries.
The out-of-line string fields make the pgstat_bestart code more ticklish
than one could wish, since they have to be treated differently from
in-line fields. But I'm not sure there's much we can do to avoid that.
It doesn't seem like a good idea to change the layout of the shared-
memory structure, at least not in released branches.
I found two other embarrassing bugs while I was at it. The stanza
initializing st_backendType was clearly inserted with the aid of a
dartboard, because it was actually before the initial st_changecount bump.
That probably has little if any real-world impact, but it's still an
indication of sloppy patching. And pgstat_read_current_status had not
been taught to copy the st_gssstatus out-of-line structure to local
storage, so that those values might fail to hold still while a
transaction examines the pg_stat_activity data. That *is* a live bug.
This patch is against HEAD --- I've not looked at how much adjustment
it'll need for the back branches, but I'm sure there's some.
regards, tom lane
Attachment | Content-Type | Size |
---|---|---|
pg-stat-activity-fixes-1.patch | text/x-diff | 22.8 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2019-05-12 01:36:47 | Re: Query on pg_stat_activity table got stuck |
Previous Message | neeraj kumar | 2019-05-10 20:32:54 | Re: Query on pg_stat_activity table got stuck |
From | Date | Subject | |
---|---|---|---|
Next Message | Glen Warnes | 2019-05-11 03:13:17 | pgAdmin4 help system non-functional after install |
Previous Message | Prakash Ramakrishnan | 2019-05-11 01:49:23 | Re: perl path issue |