Re: Corrupted Data ?

From: Ioana Danes <ioanadanes(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Corrupted Data ?
Date: 2016-08-12 13:09:33
Message-ID: CAPg0s+5V1b72xx7yT8N_9jz1nsUdYHHaWG4nchaHqE3eghce6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Hello Everyone,

I have new information on this case. I also open a post for Bucardo
because I am still not sure what triggers this problem.

The problem happened again on the same table but on another field. Few days
ago I started a fourth database called drdb that is a PITR slave from db3.

DB1, DB2, DRDB:

-[ RECORD 1 ]-------+--------------------------
gameplayid | 324455566
systemuserid | 515151
gameid | 41
transactionid | 666556533
drawid | *318220*
....

DB3:
-[ RECORD 1 ]-------+--------------------------
gameplayid | 32445556
systemuserid | 515151
gameid | 41
transactionid | 666556533
drawid | *318216*

Here are the facts I know:

August 10 @ 11:10
- The record was created on db1 and replicated to db2 and db3
August 11 @ 2:30
- db1, db2 and db3 are in sync (I have a script that compares the data
for all 3 dbs every night @ 2:30 am)
August 12 @ 2:30
- db3 is out of sync because of this field (drawid)
- drdb (which is PITRed from db3) is in sync with db1 and db2?????

Because drdb (PITR slave from db3) is in sync with db1 and db2 and because
the base backup was taken before the record in case was created, I believe
that the xlogs are fine and I have a data kind of corruption on db3 on the
data file for that table that happened after August 11 @ 2:30 (because the
compare script found the dbs in sync)...

Also the index is correct on db3 as the record in case (with drawid =
*318216*) is retrieved if I filter by drawid = *318220*

Any help is greatly appreciated,

Thank you

On Mon, Aug 8, 2016 at 1:25 PM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 08/08/2016 10:06 AM, Ioana Danes wrote:
>
>>
>>
>> On Mon, Aug 8, 2016 at 12:55 PM, Adrian Klaver> 75315811???
>>
>>
>> Corrupted index on db3?
>>
>> yes
>>
>>
>> Might want to look in the db3 logs to see if anything pops out.
>>
>> I checked the logs, no traces of errors or corruption.
>>
>>
>> I just do not know enough about Burcardo to be of much help beyond
>> that.
>>
>>
>>
>> it is trigger based, it saves the ids of the inserted record in a delta
>> table and then on sync it creates copy commands to the slave. Even if
>> there is a bug or corruption in that process I don't see how that
>> corrupts the index on db3...
>>
>
> It seems to do more then that:
>
> https://bucardo.org/wiki/Bucardo/Documentation/Overview
>
> That is why I suggested the post to the Burcardo list. Folks there will
> have a better idea what goes under the hood.
>
> There is also this from a previous post:
>
> "Only one master is active at one time the other one is in stand by that
> is a topic for another discussion but in our case that works well."
>
> Have no idea how that interaction plays out.
>
> At this point what I see is:
>
> 1) Data is entered on a master and is correct there.
>
> 2) Data is replicated to a single standby from one of two possible sources
> via Bucardo and is no longer correct.
>
> 3) Now Bucardo uses Postgres to do its work so it is possible that
> something in Postgres is at fault. Still the fact that the data is good on
> the master but not in the standby tends to indicate that the act of
> replication is the issue.
>
> 4) Exactly how that replication is accomplished is not obvious to me.
>
>
>
>
>>
>> So it is either replication bug + index corruption on db3 or data
>> corruption on db3...
>>
>>
>> In response to Melvin, the query returns no rows:
>>
>>
>> SELECT n.nspname,
>> i.relname,
>> i.indexrelname,
>> CASE WHEN idx.indisprimary
>> THEN 'pkey'
>> WHEN idx.indisunique
>> THEN 'uidx'
>> ELSE 'idx'
>> END AS type,
>> 'INVALID'
>> FROM pg_stat_all_indexes i
>> JOIN pg_class c ON (c.oid = i.relid)
>> JOIN pg_namespace n ON (n.oid = c.relnamespace)
>> JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
>> WHERE idx.indisvalid = FALSE
>> ORDER BY 1, 2;
>>
>> nspname | relname | indexrelname | type | ?column?
>> ---------+---------+--------------+------+----------
>> (0 rows)
>>
>>
>> Thank you for your thoughts,
>> ioana
>>
>>
>>
>> Thanks,
>> ioana
>>
>>
>>
>> --
>> Adrian Klaver
>> adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
>> <mailto:adrian(dot)klaver(at)aklaver(dot)com
>> <mailto:adrian(dot)klaver(at)aklaver(dot)com>>
>>
>>
>>
>>
>> --
>> Adrian Klaver
>> adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
>>
>>
>>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Melvin Davidson 2016-08-12 13:36:18 Re: Corrupted Data ?
Previous Message Chris Travers 2016-08-12 11:32:33 Re: Postgres Pain Points 2 ruby / node language drivers