Re: San replication corrupting postgres file...

From: Rahul Sharma <rahulsharma0525(at)gmail(dot)com>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: San replication corrupting postgres file...
Date: 2017-05-01 20:20:02
Message-ID: CAC0Fff_aoXyraE1-r3t=ixez1Xj3t8NfCdC8hZti2pHJdDqHcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Hi Scott,

My architecture is as follows

I have a primary server and its own LVM with the data directories pointing
to its own SAN. On the DR end we have a similar set up with its on LVM
pointing to its own directory structure and pointing to its own SAN . The
replication happens between primary and DR SAN.

The reason we opted for this architecture is we a re using multiple data
base types and to maintain data integrity b/w these we take lvm level snap
shots .

Thanks
Rahul

On Mon, May 1, 2017 at 2:39 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
wrote:

> On Mon, May 1, 2017 at 1:32 PM, Rahul Sharma <rahulsharma0525(at)gmail(dot)com>
> wrote:
> > Hi Team,
> >
> > I am facing an issue with postgres replication between my primary and DR
> > site. I have the following setup,
> >
> > 1. I am trying to replicate LVM level sanpshot on SAN which does a block
> > level replication.
> > 2. OS Details : RHEL 7.1 kernel 3.10
> > 3. Postgres Version : ( 9.6)
> >
> > The steps performed:
> >
> > 1. Stop all the containers running on the OS.
> > 2. Stop the SAN level replication.
> > 3. Switch over to the replicated site.
> > 4. Start the containers
> >
> > Here the postgres container fails with the blow error which looks like
> data
> > corruption.
> >
> > ========
> >
> > LOG: database system was interrupted; last known up at 2017-04-28
> 15:58:45
> > UTC
> > LOG: invalid magic number 7270 in log segment 000000010000000000000001,
> > offset 0
> > LOG: invalid primary checkpoint record
> > LOG: invalid magic number 7270 in log segment 000000010000000000000001,
> > offset 0
> > LOG: invalid secondary checkpoint record
> > PANIC: could not locate a valid checkpoint record
> > LOG: startup process (PID 18) was terminated by signal 6: Aborted
> > LOG: aborting startup due to startup process failure
> > LOG: database system is shut down
> >
> > =======
> >
> > I have tried the graceful shutdown of the microservices but still the
> > replication fails. Strange issues id i have other instance of postgres
> > (9.4.1 )which runs absolutely fine. Could someone please provide some
> > advice?
>
> Are your pg xlog and data directories on different volumes? If so then
> vm snapshots are likely to not be coherent due to timing etc.
>
> Is there a reason you're NOT using pgsql's built in streaming replication?
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Scott Marlowe 2017-05-01 22:48:18 Re: San replication corrupting postgres file...
Previous Message Scott Marlowe 2017-05-01 19:39:39 Re: San replication corrupting postgres file...