| From: | "K S, Sandhya (Nokia - IN/Bangalore)" <sandhya(dot)k_s(at)nokia(dot)com> | 
|---|---|
| To: | Craig Ringer <craig(at)2ndquadrant(dot)com> | 
| Cc: | pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "T, Rasna (Nokia - IN/Bangalore)" <rasna(dot)t(at)nokia(dot)com>, "Itnal, Prakash (Nokia - IN/Bangalore)" <prakash(dot)itnal(at)nokia(dot)com> | 
| Subject: | Re: Postgres process invoking exit resulting in sh-QUIT core | 
| Date: | 2017-07-07 07:10:08 | 
| Message-ID: | AM5PR0701MB264218C4D4068BEF61B59464D6AA0@AM5PR0701MB2642.eurprd07.prod.outlook.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-bugs pgsql-hackers | 
Hi Craig,
You were right about the restore_command.
We found the below logs:
Jul  6 18:39:21.527584 notice CFPU-1 postgres[5154]: [6-1] ERROR:  could not receive data from WAL stream: server closed the connection unexpectedly
Jul  6 18:39:21.527584 notice CFPU-1 postgres[5154]: [6-2] This probably means the server terminated abnormally
Jul  6 18:39:21.528072 notice CFPU-1 postgres[5154]: [6-3] before or while processing the request.
Jul  6 18:39:21.528072 notice CFPU-1 postgres[5154]: [6-4]
Jul  6 18:39:21.528072 err CFPU-1 postgres[5154]: [7-1] FATAL:  socket not open
Jul  6 18:39:21.536842 info CFPU-1 postgres[5097]: [5-1] LOG:  received immediate shutdown request
Jul  6 18:39:21.537460 notice CFPU-1 postgres[5198]: [5-1] WARNING:  terminating connection because of crash of another server process
Jul  6 18:39:21.537928 notice CFPU-1 postgres[5198]: [5-2] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
Jul  6 18:39:21.538410 notice CFPU-1 postgres[5198]: [5-3] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
Jul  6 18:39:21.539684 notice CFPU-1 postgres[5157]: [5-1] WARNING:  terminating connection because of crash of another server process
Jul  6 18:39:21.539684 notice CFPU-1 postgres[5157]: [5-2] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
Jul  6 18:39:21.540016 notice CFPU-1 postgres[5157]: [5-3] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
Jul  6 18:39:21.544922 info CFPU-1 FSPostgresWD: postgres db server exited with error 0
Jul  6 18:39:22.178637 err CFPU-1 postgres[5101]: [20-1] FATAL:  could not restore file "0000000300000002000001CB" from archive: return code 131
Immediately after this sh-QUIT core is generated. Searched for return code 131 for restore but didn’t find any helpful information.
Also if we use fast shutdown method issue is not observed but occurring in immediate shutdown method.
Collected the output of dmesg but it did not help much. PFA dmesg output when the crash occurred.
Process invoking is happening as below:
4518      5101  0.1  0.0 155968  2056 ?        Ss   18:32   0:00 postgres: startup process   waiting for 0000000300000002000001CB
4518      7919  0.0  0.0   3600   680 ?        S    18:39   0:00  \_ sh -c exit 1
Can you help us in debugging the issue further?
Regards,
Sandhya
From: Craig Ringer [mailto:craig(at)2ndquadrant(dot)com]
Sent: Wednesday, July 05, 2017 6:20 PM
To: K S, Sandhya (Nokia - IN/Bangalore) <sandhya(dot)k_s(at)nokia(dot)com>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>; PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>; T, Rasna (Nokia - IN/Bangalore) <rasna(dot)t(at)nokia(dot)com>; Itnal, Prakash (Nokia - IN/Bangalore) <prakash(dot)itnal(at)nokia(dot)com>
Subject: RE: [HACKERS] Postgres process invoking exit resulting in sh-QUIT core
On 3 Jul. 2017 23:01, "K S, Sandhya (Nokia - IN/Bangalore)" <sandhya(dot)k_s(at)nokia(dot)com<mailto:sandhya(dot)k_s(at)nokia(dot)com>> wrote:
Hi Craig,
Thanks for the response.
Scenario tried here is restart of the system multiple times. sh-QUIT core is generated when Postgres is invoking the shell to exit and may not be due to kernel or file system issues. I will try to reproduce the issue with dmesg output being printed.
However, is there any instance in Postgres where 'sh -c exit 1' will be invoked?
Most likely it's used directly or indirectly by an archive_commsnd or restore_comand you have configured.
| Attachment | Content-Type | Size | 
|---|---|---|
| lock_dmesg_3.log | application/octet-stream | 246.3 KB | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Craig Ringer | 2017-07-07 07:25:07 | Re: Postgres process invoking exit resulting in sh-QUIT core | 
| Previous Message | jothiprasath21 | 2017-07-07 05:56:26 | BUG #14736: Crash on postgresql server by autovacuum worker process | 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Michael Paquier | 2017-07-07 07:21:05 | Re: pg_stop_backup(wait_for_archive := true) on standby server | 
| Previous Message | Masahiko Sawada | 2017-07-07 07:06:05 | Re: pg_stop_backup(wait_for_archive := true) on standby server |