Re: replication consistency checking

From: hydra <hydrapolic(at)gmail(dot)com>
To: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: replication consistency checking
Date: 2015-06-11 05:24:09
Message-ID: CAG6MAzQdRg9E1oM=idUFAwVPnEV5P3Cmz4B0vjwJaLRLc+0moQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Wed, Jun 10, 2015 at 8:56 AM, Wei Shan <weishan(dot)ang(at)gmail(dot)com> wrote:

> That's called switchover my friend...
>

Thanks for clarification.

So for you it's like this:
switchover - planned failover from master to slave (maybe because
upgrades, better hardware)
master failover - UNplanned failover to slave because master has failed

If that is true for the whole PostgreSQL community then yes, let's call it
switchover.

However needing an online tool for data verification when your master has
failed doesn't give much sense anyway ;) The tool is needed when the
operation is normal so you can periodically check the status.

For example Mozilla used to check the databases each 12 hours:
https://blog.mozilla.org/it/2013/12/16/upgrading-from-mysql-5-1-to-mariadb-5-5/

That is great. I have started to check at least once a day. Of course there
are multiple layers of trusting the data / backups more:
- good hardware,
- having the db set with safety (not speed) if the data are critical
(syncing data to disk, having data/binlog checkums),
- checking the replication consistency (because mostly we do backups from
the slave?),
- periodically restoring the slave backup to the test environment (each
day) so developers can check

This has worked for me so far, that's why I was asking for something
similar in the PostgreSQL world.

Thanks guys for the tips / notes so far.

> On 10 June 2015 at 02:57, hydra <hydrapolic(at)gmail(dot)com> wrote:
>
>>
>>
>> On Tue, Jun 9, 2015 at 6:06 AM, Wei Shan <weishan(dot)ang(at)gmail(dot)com> wrote:
>>
>>> Hi,
>>>
>>> Usually, for failover, we don't do data consistency check. because it's
>>> unplanned failover! In my opinion, it's quite safe to trust
>>> pg_stat_replication because it's a binary level replication so it's hard to
>>> get row missing unless it's a bug. You got higher chance of getting
>>> corruption. However, for switchover, we will do the following;
>>>
>>> 1. Checking pg_stat_replication views
>>> 2. Have a script that do a row count on all the data tables for a period
>>> of time. We should see that the 2 database have have very similar row count
>>> and maybe a slight delay of 1s?
>>>
>>> I know that this is not perfect, but at least it's something to start
>>> with. The only tool out that that's doing this is Oracle VeriData but I'm
>>> not too sure how they are doing this.
>>>
>>> Cheers.
>>>
>>>
>> Hello,there are situation you wish to do a failover and not just because
>> it's unplanned. For example when migrating to a better hardware.
>>
>>
>>
>>> On 6 June 2015 at 12:43, hydra <hydrapolic(at)gmail(dot)com> wrote:
>>>
>>>>
>>>>
>>>> On Fri, Jun 5, 2015 at 4:56 PM, Scott Ribe <scott_ribe(at)elevated-dev(dot)com
>>>> > wrote:
>>>>
>>>>> On Jun 5, 2015, at 8:42 AM, Igor Neyman <ineyman(at)perceptron(dot)com>
>>>>> wrote:
>>>>> >
>>>>> > The problem I see with “checksum utility” is that for it to work
>>>>> both compared servers should be “static”: not transactions while it does
>>>>> its job.
>>>>>
>>>>> Indeed, and that was brought up before and OP seems to be ignoring it.
>>>>> What magic does MySQL (supposedly) use to compare databases without
>>>>> interfering with updates?
>>>>>
>>>>> One could imagine a built-in feature in PG which depends on using MVCC
>>>>> and having both sides look at the same snapshot. (Which would require
>>>>> repeatable reads.)
>>>>>
>>>>> But for an external utility, that's quite a bit harder. One suggestion
>>>>> which would involve minimal interruption to processing: if you have a
>>>>> snapshottable file system, shut down master, take snapshot, shut down
>>>>> replica, bring master back up, snapshot replica, bring it back up. You
>>>>> *still* have the issue of making sure that at the moment you take master
>>>>> down all changes have been streamed to replica, and the (easier) issue of
>>>>> making sure replica has applied them before taking its snapshot... But if
>>>>> you can manage to pull that off, then you can checksum & compare as much as
>>>>> you want, *IF* you can actually pull that off correctly ;-)
>>>>>
>>>>>
>>>>
>>>> Hello,
>>>> I wasn't talking about a static check, indeed I was referring to an
>>>> online tool.
>>>>
>>>> I haven't read the sources for the MySQL tool yet, however it computes
>>>> the checksum on the master, writes the operation to the binary log and
>>>> while using statement replication, the slave computes the checksum - then
>>>> those can be compared. Not all data in the table are checksummed at once,
>>>> but smaller chunks are used instead.
>>>>
>>>> As Igor mentioned before, that tool can also fail and is not 100%. But
>>>> I suppose if it would be misbehaving, chances are the checksums will differ
>>>> and you will notice it. It will probably not by accident compute the same
>>>> crc/md5 whatever.
>>>>
>>>> I have done some tests with it, comparing data after setting up
>>>> replication, all went fine, then I changed some integer on the slave, it
>>>> really computed different checkum as expected. I also did a logical dump
>>>> compare, it really seems to be doing its job fine.
>>>>
>>>>
>>>>
>>>>> --
>>>>> Scott Ribe
>>>>> scott_ribe(at)elevated-dev(dot)com
>>>>> http://www.elevated-dev.com/
>>>>> https://www.linkedin.com/in/scottribe/
>>>>> (303) 722-0567 voice
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Ang Wei Shan
>>>
>>
>>
>
>
> --
> Regards,
> Ang Wei Shan
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Bruce Momjian 2015-06-11 12:39:05 Re: replication consistency checking
Previous Message hydra 2015-06-11 05:16:14 Re: replication consistency checking