From: | Rainer Tammer <pgsql(at)spg(dot)schulergroup(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Postgres 9.2.13 on AIX 7.1 |
Date: | 2021-08-25 16:04:17 |
Message-ID: | 5e4f9356-26cc-bd75-4f82-92d26ce575f7@spg.schulergroup.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Hello,
It did run the server with auto vacuum disabled for ~ 24h - no server
shutdown.
After re-enabling auto vacuum the server dies in less then 9 hours:
Started: 2021-08-25 08:12:29
Dies: 2021-08-25 16:22:33
During the time of the shutdown there was no access to the server.
No running applications and no psql cli sessions.
I will let it run over night and see it the server is going down again.
There is no software installed on this AIX LPAR which uses this instance
or sends signals to the server.
I did only do some interaction during the day to see if the server is
working correctly.
Unfortunately I can not really see in the main process which other
process did sent the signal SIGINT.
This is the only correlation I see:
2021-08-25 16:22:27 CEST DEBUG: server process (PID 19005776) exited
with exit code 0
2021-08-25 16:22:33 CEST DEBUG: postmaster received signal 2
2021-08-25 16:22:33 CEST LOG: received fast shutdown request
2021-08-25 16:22:33 CEST LOG: aborting any active transactions
2021-08-25 16:22:33 CEST LOG: autovacuum launcher shutting down
The time gap is 6s.... so it might be a bit far away from the last
process exit.
I could migrate the test DB to 9.6.23 and see if the problem persists.
Would it be worth adding additional code before every signal to trace
the source ID and the target PID as well as the source/target process name?
The OS is at the latest patch level.
The compiler is at the latest patch level.
The 9.2.x is at the latest patch level.
I can also run a trace tomorrow, this would give me some information:
Sample output (shortened):
Wed Aug 25 17:58:51 2021
System: AIX 7.2 Node: host
Machine: 000000000000
Internet Address: 00000000 1.1.1.1
At trace startup, the system contained 16 cpus, of which 16 were traced.
Buffering: Kernel Heap
This is from a 64-bit kernel.
Tracing only these hooks, 14e0
ID PROCESS NAME PID TID I SYSTEM CALL ELAPSED_SEC
DELTA_MSEC APPL SYSCALL KERNEL INTERRUPT
001 trace 23789978 87687537 0.000000000
0.000000 TRACE ON channel 0
Wed Aug 25 17:58:51 2021
14E postgres: 18743746 85000571 7.903995939
2994.175459 kill: signal SIGUSR1 to process 25166296
postgres
14E --1- -1 85393753 7.904962367
0.966428 kill: signal SIGUSR2 to process 18743746
postgres:
14E --1- -1 85393753 7.946566507
41.604140 kill: signal SIGUSR2 to process 18743746
postgres:
14E postgres: 18743746 85000571 17.902007437
2992.131623 kill: signal SIGUSR1 to process 25166296
postgres
14E --1- -1 94437835 17.903004949
0.997512 kill: signal SIGUSR2 to process 18743746
postgres:
14E --1- -1 94437835 17.935897005
32.892056 kill: signal SIGUSR2 to process 18743746
postgres:
14E postgres: 18743746 85000571 28.001327251
3091.401199 kill: signal SIGUSR1 to process 25166296
postgres
14E --1- -1 40042983 28.002307781
0.980530 kill: signal SIGUSR2 to process 18743746
postgres:
14E --1- -1 40042983 28.032432646
30.124865 kill: signal SIGUSR2 to process 18743746
postgres:
14E postgres: 18743746 85000571 37.901060572
2991.083160 kill: signal SIGUSR1 to process 25166296
postgres
14E --1- -1 88539511 37.902072470
1.011898 kill: signal SIGUSR2 to process 18743746
postgres:
14E --1- -1 88539511 37.936426058
34.353588 kill: signal SIGUSR2 to process 18743746
postgres:
I do not observe this with V8.x servers.
That stupid problem is taking my nerves!!
Bye
Rainer
On 25.08.2021 17:13, Tom Lane wrote:
> Rainer Tammer <pgsql(at)spg(dot)schulergroup(dot)com> writes:
>> 2021-08-25 16:22:33 CEST DEBUG: postmaster received signal 2
>> 2021-08-25 16:22:33 CEST LOG: received fast shutdown request
> Well, something sent the postmaster SIGINT. There isn't any
> mechanism within Postgres itself that would do that; you need
> to look for outside causes.
>
> regards, tom lane
>
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2021-08-25 16:19:30 | Re: Postgres 9.2.13 on AIX 7.1 |
Previous Message | Tom Lane | 2021-08-25 15:40:06 | Re: BUG #17160: PostgreSQL13.4:Build failure with GNU Compiler. |