From: | Ioana Danes <ioanadanes(at)gmail(dot)com> |
---|---|
To: | Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> |
Cc: | Francisco Olarte <folarte(at)peoplecall(dot)com>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org> |
Subject: | Re: Corrupted Data ? |
Date: | 2016-08-12 15:49:42 |
Message-ID: | CAPg0s+6iXfQ4gz1rCUfdg=60FSfv4Twb2qyObBbawk9vTznQ5A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Fri, Aug 12, 2016 at 11:44 AM, Ioana Danes <ioanadanes(at)gmail(dot)com> wrote:
>
>
> On Fri, Aug 12, 2016 at 11:34 AM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com
> > wrote:
>
>> On 08/12/2016 08:30 AM, Ioana Danes wrote:
>>
>>>
>>>
>>> On Fri, Aug 12, 2016 at 11:26 AM, Adrian Klaver
>>> <adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>> wrote:
>>>
>>> On 08/12/2016 08:10 AM, Ioana Danes wrote:
>>>
>>>
>>>
>>> On Fri, Aug 12, 2016 at 10:47 AM, Francisco Olarte
>>> <folarte(at)peoplecall(dot)com <mailto:folarte(at)peoplecall(dot)com>
>>> <mailto:folarte(at)peoplecall(dot)com <mailto:folarte(at)peoplecall(dot)com>>>
>>> wrote:
>>>
>>> CCing to the list...
>>>
>>> Thanks
>>>
>>>
>>> On Fri, Aug 12, 2016 at 4:10 PM, Ioana Danes
>>> <ioanadanes(at)gmail(dot)com <mailto:ioanadanes(at)gmail(dot)com>
>>> <mailto:ioanadanes(at)gmail(dot)com <mailto:ioanadanes(at)gmail(dot)com>>>
>>> wrote:
>>> >> given 318220 and 318216 are just a bit away ( 4db08/4db0c
>>> ), and it
>>> >> repeats sporadically, have you ruled out ( by having page
>>> checksums or
>>> >> other mechanism ) a potential disk read/write error ?
>>> >>
>>> >>
>>> >> > Also the index is correct on db3 as the record in case
>>> (with
>>> drawid =
>>> >> > 318216) is retrieved if I filter by drawid = 318220
>>> >>
>>> >> Specially if this happens, you may have some slightly bad
>>> disks/ram/
>>> >> leading to this kind of problems.
>>> >>
>>> >
>>> > Could be. I also had some issues with an rsync between db3
>>> and
>>> drdb a week
>>> > ago that did not complete for bigger files (> 200MB) and
>>> gave me some
>>> > corruption messages. Then the system was revbooted and
>>> everything
>>> seemed
>>> > fine but apparently it is not.
>>> > I am planning to drop & create the table from a good
>>> backup and if
>>> that does
>>> > not fix the issue then I will rebuild the server.
>>>
>>> I would check whatever logs you can ( syslog or eventlog,
>>> smart log,
>>> etc.. ) hunting for disk errors ( sometimes they are
>>> reported ). This
>>> kind of problems, with programs as tested as postgres and
>>> rsync, tend
>>> to indicate controller/RAM/disk going bad ( in your case it
>>> could be
>>> caused by a single bit getting flipped in a sector for the
>>> data
>>> portion of the table, and not being propagated either
>>> because it
>>> happened after your sync of drdb or because it was synced
>>> from the WAL
>>> and not the table, or because it was read from the disk
>>> cache ).
>>>
>>> I agree, unfortunately I did not find any clues about corruption
>>> or any
>>> anomalies in the logs.
>>> I will work tonight to rebuild that table and see where I go
>>> from there.
>>>
>>>
>>> The db3 database is on a different machine from all the other
>>> databases you set up, correct?
>>>
>>> Yes, they are all different vms first 3 dbs are on the same cluster but
>>> drdb is a remote machine,
>>>
>>
>> Aah, another player in the mix.
>>
>> What virtualization technology are you using?
>>
>
> kvm
>
Sorry I should add more info
kernel 4.7
and the filesystem is xfs vs ext3/ext4
>
>>
>>> Thank you
>>>
>>>
>>>
>>> Thanks,
>>> ioana
>>>
>>> Francisco Olarte.
>>>
>>>
>>>
>>>
>>> --
>>> Adrian Klaver
>>> adrian(dot)klaver(at)aklaver(dot)com <mailto:adrian(dot)klaver(at)aklaver(dot)com>
>>>
>>>
>>>
>>
>> --
>> Adrian Klaver
>> adrian(dot)klaver(at)aklaver(dot)com
>>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Adrian Klaver | 2016-08-12 16:11:45 | Re: Error at dynamic generated copy... |
Previous Message | Ioana Danes | 2016-08-12 15:44:19 | Re: Corrupted Data ? |