Re: Trying to handle db corruption 9.6

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Mariel Cherkassky <mariel(dot)cherkassky(at)gmail(dot)com>
Cc: Bimal <internetuser2008(at)yahoo(dot)com>, Greg Clough <Greg(dot)Clough(at)ihsmarkit(dot)com>, PostgreSQL mailing lists <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Trying to handle db corruption 9.6
Date: 2019-05-20 21:04:06
Message-ID: 20190520210406.5paaijqd5no6imez@development
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-performance

On Mon, May 20, 2019 at 08:20:33PM +0300, Mariel Cherkassky wrote:
> Hey Greg,
> Basically my backup was made after the first pg_resetxlog so I was wrong.

Bummer.

> However, the customer had a secondary machine that wasn't synced for a
> month. I have all the walls since the moment the secondary went out of
> sync. Once I started it I hoped that it will start recover the wals and
> fill the gap. However I got an error in the secondary :         
>  2019-05-20 10:11:28 PDT  19021  LOG:  entering standby mode
> 2019-05-20 10:11:28 PDT  19021  LOG:  invalid primary checkpoint record
> 2019-05-20 10:11:28 PDT  19021  LOG:  invalid secondary checkpoint link in
> control file
> 2019-05-20 10:11:28 PDT  19021  PANIC:  could not locate a valid
> checkpoint record
> 2019-05-20 10:11:28 PDT  19018  LOG:  startup process (PID 19021) was
> terminated by signal 6: Aborted
> 2019-05-20 10:11:28 PDT  19018  LOG:  aborting startup due to startup
> process failure
> 2019-05-20 10:11:28 PDT  19018  LOG:  database system is shut down.       
>                     I checked my secondary archive dir and pg_xlog dir and
> it seems that the restore command doesnt work. My restore_command:      
> restore_command = 'rsync -avzhe ssh
> postgres(at)x(dot)x(dot)x(dot)x:/var/lib/pgsql/archive/%f /var/lib/pgsql/archive/%f ;
> gunzip < /var/lib/pgsql/archive/%f > %p'
> archive_cleanup_command = '/usr/pgsql-9.6/bin/pg_archivecleanup
> /var/lib/pgsql/archive %r'

Well, when you say it does not work, why do you think so? Does it print
some error, or what? Does it even get executed? It does not seem to be
the case, judging by the log (there's no archive_command message).

How was the "secondary machine" created? You said you have all the WAL
since then - how do you know that?

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Mariel Cherkassky 2019-05-21 09:01:31 Re: Trying to handle db corruption 9.6
Previous Message Tomas Vondra 2019-05-20 20:55:53 Re: Trying to handle db corruption 9.6

Browse pgsql-performance by date

  From Date Subject
Next Message Deepak Somaiya 2019-05-20 21:37:34 Re: Re: Re: Generic Plans for Prepared Statement are 158155 times slower than Custom Plans
Previous Message Tomas Vondra 2019-05-20 20:55:53 Re: Trying to handle db corruption 9.6