Re: Corrupted Data ?

From: Ioana Danes <ioanadanes(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: Francisco Olarte <folarte(at)peoplecall(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Corrupted Data ?
Date: 2016-08-12 15:49:42
Message-ID: CAPg0s+6iXfQ4gz1rCUfdg=60FSfv4Twb2qyObBbawk9vTznQ5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Aug 12, 2016 at 11:44 AM, Ioana Danes <ioanadanes(at)gmail(dot)com> wrote:

>
>
> On Fri, Aug 12, 2016 at 11:34 AM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com
> > wrote:
>
>> On 08/12/2016 08:30 AM, Ioana Danes wrote:
>>
>>>
>>>
>>> On Fri, Aug 12, 2016 at 11:26 AM, Adrian Klaver
>>> <adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
>>>
>>> On 08/12/2016 08:10 AM, Ioana Danes wrote:
>>>
>>>
>>>
>>> On Fri, Aug 12, 2016 at 10:47 AM, Francisco Olarte
>>> <folarte(at)peoplecall(dot)com <mailto:folarte(at)peoplecall(dot)com>
>>> <mailto:folarte(at)peoplecall(dot)com <mailto:folarte(at)peoplecall(dot)com>>>
>>> wrote:
>>>
>>> CCing to the list...
>>>
>>> Thanks
>>>
>>>
>>> On Fri, Aug 12, 2016 at 4:10 PM, Ioana Danes
>>> <ioanadanes(at)gmail(dot)com <mailto:ioanadanes(at)gmail(dot)com>
>>> <mailto:ioanadanes(at)gmail(dot)com <mailto:ioanadanes(at)gmail(dot)com>>>
>>> wrote:
>>> >> given 318220 and 318216 are just a bit away ( 4db08/4db0c
>>> ), and it
>>> >> repeats sporadically, have you ruled out ( by having page
>>> checksums or
>>> >> other mechanism ) a potential disk read/write error ?
>>> >>
>>> >>
>>> >> > Also the index is correct on db3 as the record in case
>>> (with
>>> drawid =
>>> >> > 318216) is retrieved if I filter by drawid = 318220
>>> >>
>>> >> Specially if this happens, you may have some slightly bad
>>> disks/ram/
>>> >> leading to this kind of problems.
>>> >>
>>> >
>>> > Could be. I also had some issues with an rsync between db3
>>> and
>>> drdb a week
>>> > ago that did not complete for bigger files (> 200MB) and
>>> gave me some
>>> > corruption messages. Then the system was revbooted and
>>> everything
>>> seemed
>>> > fine but apparently it is not.
>>> > I am planning to drop & create the table from a good
>>> backup and if
>>> that does
>>> > not fix the issue then I will rebuild the server.
>>>
>>> I would check whatever logs you can ( syslog or eventlog,
>>> smart log,
>>> etc.. ) hunting for disk errors ( sometimes they are
>>> reported ). This
>>> kind of problems, with programs as tested as postgres and
>>> rsync, tend
>>> to indicate controller/RAM/disk going bad ( in your case it
>>> could be
>>> caused by a single bit getting flipped in a sector for the
>>> data
>>> portion of the table, and not being propagated either
>>> because it
>>> happened after your sync of drdb or because it was synced
>>> from the WAL
>>> and not the table, or because it was read from the disk
>>> cache ).
>>>
>>> I agree, unfortunately I did not find any clues about corruption
>>> or any
>>> anomalies in the logs.
>>> I will work tonight to rebuild that table and see where I go
>>> from there.
>>>
>>>
>>> The db3 database is on a different machine from all the other
>>> databases you set up, correct?
>>>
>>> Yes, they are all different vms first 3 dbs are on the same cluster but
>>> drdb is a remote machine,
>>>
>>
>> Aah, another player in the mix.
>>
>> What virtualization technology are you using?
>>
>
> kvm
>
Sorry I should add more info
kernel 4.7
and the filesystem is xfs vs ext3/ext4

>
>>
>>> Thank you
>>>
>>>
>>>
>>> Thanks,
>>> ioana
>>>
>>> Francisco Olarte.
>>>
>>>
>>>
>>>
>>> --
>>> Adrian Klaver
>>> adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
>>>
>>>
>>>
>>
>> --
>> Adrian Klaver
>> adrian(dot)klaver(at)aklaver(dot)com
>>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2016-08-12 16:11:45 Re: Error at dynamic generated copy...
Previous Message Ioana Danes 2016-08-12 15:44:19 Re: Corrupted Data ?