Re: Corrupted Data ?

From: Ioana Danes <ioanadanes(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Corrupted Data ?
Date: 2016-08-08 17:06:08
Message-ID: CAPg0s+4WpL0W2mbvRbkt=67bRWttSUiJVJPwcTEVT_5Okw5jyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Aug 8, 2016 at 12:55 PM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
wrote:

> On 08/08/2016 09:47 AM, Ioana Danes wrote:
>
>>
>>
>> On Mon, Aug 8, 2016 at 12:37 PM, Adrian Klaver
>> <adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
>>
>> On 08/08/2016 09:28 AM, Ioana Danes wrote:
>>
>>
>>
>> On Mon, Aug 8, 2016 at 12:19 PM, Adrian Klaver
>> <adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
>> <mailto:adrian(dot)klaver(at)aklaver(dot)com
>> <mailto:adrian(dot)klaver(at)aklaver(dot)com>>> wrote:
>>
>> On 08/08/2016 09:11 AM, Ioana Danes wrote:
>>
>> Hi,
>>
>> I suspect I am having a case of data corruption. Here
>> are the
>> details:
>>
>> I am running postgres 9.4.8:
>>
>> postgresql94-9.4.8-1PGDG.rhel7.x86_64
>> postgresql94-contrib-9.4.8-1PGDG.rhel7.x86_64
>> postgresql94-libs-9.4.8-1PGDG.rhel7.x86_64
>> postgresql94-server-9.4.8-1PGDG.rhel7.x86_64
>>
>> on CentOS Linux release 7.2.1511 (Core)
>>
>> This is happening in a production environment but
>> luckily on the
>> reporting database.
>> I have a cluster of 3 databases, db1 and db2 are masters
>> and
>> replicate
>> between each other and also replicate to db3 (db1 <->
>> db2, db1
>> -> db3,
>> db2 -> db3).
>> For replication I am using Bucardo.
>>
>>
>> I would say this is more a question for the Burcardo list:
>>
>> https://mail.endcrypt.com/mailman/listinfo/bucardo-general
>> <https://mail.endcrypt.com/mailman/listinfo/bucardo-general>
>> <https://mail.endcrypt.com/mailman/listinfo/bucardo-general
>> <https://mail.endcrypt.com/mailman/listinfo/bucardo-general>>
>>
>> I am just not seeing that replicating two masters on to a
>> single
>> database is going to end well.
>>
>>
>> Only one master is active at one time the other one is in stand
>> by that
>> is a topic for another discussion but in our case that works well.
>>
>> That was my first assumption, that it is a kind of a race
>> condition or a
>> bug on replication but I quickly ruled that out because that
>> does not
>> explain why when I filtered the table by transactionid = 75315815
>> it
>> shows one record with transactionid 75315811...
>>
>> select gameplayid, transactionid, encodedplay from
>> abrazo.matchgameplay
>> where transactionid in (75315815) order by transactionid;;
>> gameplayid | transactionid | encodedplay
>> ------------+---------------+--------------
>> 160019271 | 75315815 | mix:9,0,9
>> 160019269 | 75315815 | mix:9,8,9
>> 160019267 | 75315815 | mix:9,2,2
>> 160019265 | 75315815 | mix:2,2,8
>> 160019263 | *75315811 *| backup:1,9,1
>> 160019261 | 75315815 | backup:2,0,9
>>
>> So I don't think it is a replication issue...
>>
>>
>> Other that, if I am following correctly, it is on the database(db3)
>> being replicated to. The only way db3 is getting its data is through
>> replication, is that correct?. On the master databases the data is
>> correct.
>>
>> OK, let's assume that what you're saying is correct and the replication
>> has a bug, or corruption or whatever that is and the record gets created
>> with transactionid = 75315811. Bucardo replication is trigger based and
>> it is using a copy command to insert the new records into the replicated
>> database.
>>
>> Then how can I explain that my query select gameplayid, transactionid,
>> encodedplay from abrazo.matchgameplay where transactionid in (75315815)
>> order by transactionid; returns me a record with transactionid 75315811???
>>
>
> Corrupted index on db3?
>
> yes

> Might want to look in the db3 logs to see if anything pops out.
>
> I checked the logs, no traces of errors or corruption.

> I just do not know enough about Burcardo to be of much help beyond that.
>

it is trigger based, it saves the ids of the inserted record in a delta
table and then on sync it creates copy commands to the slave. Even if there
is a bug or corruption in that process I don't see how that corrupts the
index on db3...

So it is either replication bug + index corruption on db3 or data
corruption on db3...

In response to Melvin, the query returns no rows:

SELECT n.nspname,
i.relname,
i.indexrelname,
CASE WHEN idx.indisprimary
THEN 'pkey'
WHEN idx.indisunique
THEN 'uidx'
ELSE 'idx'
END AS type,
'INVALID'
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid = i.indexrelid )
WHERE idx.indisvalid = FALSE
ORDER BY 1, 2;

nspname | relname | indexrelname | type | ?column?
---------+---------+--------------+------+----------
(0 rows)

Thank you for your thoughts,
ioana

>
>
>> Thanks,
>> ioana
>>
>>
>>
>> --
>> Adrian Klaver
>> adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
>>
>>
>>
>
> --
> Adrian Klaver
> adrian(dot)klaver(at)aklaver(dot)com
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2016-08-08 17:25:21 Re: Corrupted Data ?
Previous Message Adrian Klaver 2016-08-08 16:55:18 Re: Corrupted Data ?