Re: Replication failed

From: Sreejith P <sreejith(at)lifetrenz(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, "pgsql-admin(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: Replication failed
Date: 2020-12-18 10:19:02
Message-ID: PS2PR02MB2837F275523C57499D567C97F3C30@PS2PR02MB2837.apcprd02.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Yes.

There was a crash.

Master recovered automatically .. But replication failed.

See blow log from Master.

cp: error writing '/BackupVolume/hisDbBackup/WALs/0000000200000AE6000000B9': No space left on device
2020-12-17 21:55:26 +04 [55822]: user=,db=,app=,client= LOG: archive command failed with exit code 1
2020-12-17 21:55:26 +04 [55822]: user=,db=,app=,client= DETAIL: The failed archive command was: test ! -f /BackupVolume/dbWALBack/WALs/0000000200000AE6000000B9 && cp pg_wal/0000000200000AE6000000B9 /BackupVolume/hisDbBackup/WALs/0000000200000AE6000000B9

2020-12-17 21:53:56 +04 [84225]: user=AstDBA,db=AST-PROD,app=[unknown],client=172.18.200.100 HINT: In a moment you should be able to reconnect to the database and repeat your command.
2020-12-17 21:53:56 +04 [51102]: user=AstDBA,db=AST-PROD,app=[unknown],client=172.18.200.100 WARNING: terminating connection because of crash of another server process
2020-12-17 21:53:56 +04 [51102]: user=AstDBA,db=AST-PROD,app=[unknown],client=172.18.200.100 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.


2020-12-17 21:55:20 +04 [55808]: user=replicator,db=[unknown],app=[unknown],client=172.18.200.144 FATAL: the database system is in recovery mode


2020-12-17 21:55:25 +04 [55828]: user=replicator,db=[unknown],app=walreceiver,client=172.18.200.148 ERROR: requested starting point AEA/A2000000 is ahead of the WAL flush position of this server AEA/A1FFFA50
2020-12-17 21:55:25 +04 [55827]: user=replicator,db=[unknown],app=walreceiver,client=172.18.200.147 ERROR: requested starting point AEA/A2000000 is ahead of the WAL flush position of this server AEA/A1FFFA50
2020-12-17 21:55:25 +04 [55829]: user=replicator,db=[unknown],app=walreceiver,client=172.18.200.145 ERROR: requested starting point AEA/A2000000 is ahead of the WAL flush position of this server AEA/A1FFFA50
2020-12-17 21:55:25 +04 [55830]: user=replicator,db=[unknown],app=walreceiver,client=172.18.200.146 ERROR: requested starting point AEA/A2000000 is ahead of the WAL flush position of this server AEA/A1FFFA50
2020-12-17 21:55:25 +04 [55831]: user=replicator,db=[unknown],app=walreceiver,client=172.18.200.144 ERROR: requested starting point AEA/A2000000 is ahead of the WAL flush position of this server AEA/A1FFFA50

On 18/12/20, 2:13 PM, "Laurenz Albe" <laurenz(dot)albe(at)cybertec(dot)at> wrote:

On Fri, 2020-12-18 at 00:56 +0530, Sreejith P wrote:
> We had 1 M x 4 slave server streaming replication using Postgres 10. Was working successfully for long time.
>
> Suddenly all replication servers got failed and getting following message.
>
> 2020-12-17 22:24:32 +04 [1587]: [357-1] db=,user=LOG: invalid contrecord length 2722 at AEA/A1FFF9E0.
>
> Requesting help for identifying root cause.

Looks like this problem:
https://postgr.es/m/77734732-44A4-4209-8C2F-3AF36C9D4D18%40amazon.com

Was there a crash on the primary sever?

There is a patch under development at this thread:
https://postgr.es/m/CBDDFA01-6E40-46BB-9F98-9340F4379505%40amazon.com

Yours,
Laurenz Albe

--

 

*Solutions for Care Anywhere*
*dWise HealthCare IT Solutions Pvt.
Ltd.* | www.lifetrenz.com <http://www.lifetrenz.com>
*Disclaimer*:
The
information and attachments contained in this email are intended
for
exclusive use of the addressee(s) and may contain confidential or
privileged information. If you are not the intended recipient, please
notify the sender immediately and destroy all copies of this message and

any attachments. The views expressed in this email are, unless
otherwise
stated, those of the author and not those of dWise HealthCare IT Solutions
or its management.

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Laurenz Albe 2020-12-18 11:55:34 Re: Replication failed
Previous Message Laurenz Albe 2020-12-18 08:43:30 Re: Replication failed