Re: Postgres 9.2.13 on AIX 7.1

From: Rainer Tammer <pgsql(at)spg(dot)schulergroup(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Postgres 9.2.13 on AIX 7.1
Date: 2021-08-25 16:04:17
Message-ID: 5e4f9356-26cc-bd75-4f82-92d26ce575f7@spg.schulergroup.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hello,
It did run the server with auto vacuum disabled for ~ 24h - no server
shutdown.
After re-enabling auto vacuum the server dies in less then 9 hours:

Started: 2021-08-25 08:12:29
Dies: 2021-08-25 16:22:33

During the time of the shutdown there was no access to the server.
No running applications and no psql cli sessions.

I will let it run over night and see it the server is going down again.
There is no software installed on this AIX LPAR which uses this instance
or sends signals to the server.
I did only do some interaction during the day to see if the server is
working correctly.
Unfortunately I can not really see in the main process which other
process did sent the signal SIGINT.

This is the only correlation I see:

2021-08-25 16:22:27 CEST  DEBUG:  server process (PID 19005776) exited
with exit code 0
2021-08-25 16:22:33 CEST  DEBUG:  postmaster received signal 2
2021-08-25 16:22:33 CEST  LOG:  received fast shutdown request
2021-08-25 16:22:33 CEST  LOG:  aborting any active transactions
2021-08-25 16:22:33 CEST  LOG:  autovacuum launcher shutting down

The time gap is 6s.... so it might be a bit far away from the last
process exit.

I could migrate the test DB to 9.6.23 and see if the problem persists.
Would it be worth adding additional code before every signal to trace
the source ID and the target PID as well as the source/target process name?

The OS is at the latest patch level.
The compiler is at the latest patch level.
The 9.2.x is at the latest patch level.

I can also run a trace tomorrow, this would give me some information:

Sample output (shortened):

Wed Aug 25 17:58:51 2021
System: AIX 7.2 Node: host
Machine: 000000000000
Internet Address: 00000000 1.1.1.1
At trace startup, the system contained 16 cpus, of which 16 were traced.
Buffering: Kernel Heap
This is from a 64-bit kernel.
Tracing only these hooks, 14e0

ID   PROCESS NAME   PID      TID      I SYSTEM CALL ELAPSED_SEC    
DELTA_MSEC   APPL    SYSCALL KERNEL  INTERRUPT

001  trace          23789978 87687537 0.000000000      
0.000000                   TRACE ON channel 0
Wed Aug 25 17:58:51 2021
14E  postgres:      18743746 85000571 7.903995939   
2994.175459                   kill: signal SIGUSR1 to process 25166296
postgres
14E  --1-           -1       85393753 7.904962367      
0.966428                   kill: signal SIGUSR2 to process 18743746
postgres:
14E  --1-           -1       85393753 7.946566507     
41.604140                   kill: signal SIGUSR2 to process 18743746
postgres:
14E  postgres:      18743746 85000571 17.902007437   
2992.131623                   kill: signal SIGUSR1 to process 25166296
postgres
14E  --1-           -1       94437835 17.903004949      
0.997512                   kill: signal SIGUSR2 to process 18743746
postgres:
14E  --1-           -1       94437835 17.935897005     
32.892056                   kill: signal SIGUSR2 to process 18743746
postgres:
14E  postgres:      18743746 85000571 28.001327251   
3091.401199                   kill: signal SIGUSR1 to process 25166296
postgres
14E  --1-           -1       40042983 28.002307781      
0.980530                   kill: signal SIGUSR2 to process 18743746
postgres:
14E  --1-           -1       40042983 28.032432646     
30.124865                   kill: signal SIGUSR2 to process 18743746
postgres:
14E  postgres:      18743746 85000571 37.901060572   
2991.083160                   kill: signal SIGUSR1 to process 25166296
postgres
14E  --1-           -1       88539511 37.902072470      
1.011898                   kill: signal SIGUSR2 to process 18743746
postgres:
14E  --1-           -1       88539511 37.936426058     
34.353588                   kill: signal SIGUSR2 to process 18743746
postgres:

I do not observe this with V8.x servers.

That stupid problem is taking my nerves!!

Bye
  Rainer

On 25.08.2021 17:13, Tom Lane wrote:
> Rainer Tammer <pgsql(at)spg(dot)schulergroup(dot)com> writes:
>> 2021-08-25 16:22:33 CEST  DEBUG: postmaster received signal 2
>> 2021-08-25 16:22:33 CEST  LOG:  received fast shutdown request
> Well, something sent the postmaster SIGINT. There isn't any
> mechanism within Postgres itself that would do that; you need
> to look for outside causes.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2021-08-25 16:19:30 Re: Postgres 9.2.13 on AIX 7.1
Previous Message Tom Lane 2021-08-25 15:40:06 Re: BUG #17160: PostgreSQL13.4:Build failure with GNU Compiler.