From: | Karthik Iyer <karthik(dot)i(at)directi(dot)com> |
---|---|
To: | Shaun Thomas <sthomas(at)optionshouse(dot)com>, pgsql-general(at)postgresql(dot)org |
Cc: | KiritSinh <kirit(dot)p(at)directi(dot)com> |
Subject: | Re: DATA corruption after promoting slave to master |
Date: | 2014-06-26 14:44:46 |
Message-ID: | 53AC31DE.7030902@directi.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Thanks Shaun.
We reindexed all the primary and unique keys of all the tables, But we
did not reindex the tables. You think we should do that also ?
Also, do you think we should do a clean dump restore to eliminate all
data inconsistencies.
One more query :
We managed to get the old server up. We are trying to play the
difference in data by checking the log files(statement logs). You think
there is any other easy alternatives ?
Thanks in advance,
- Karthik
On Thursday 26 June 2014 01:09 AM, Shaun Thomas wrote:
> On 06/25/2014 06:29 AM, Karthik Iyer wrote:
>
>> [2] We also have a daemon process which copies the latest partial WAL
>> log file (which postgres is currently writing to, under pg_xlog/) every
>> 3 secs to a different location.
>
> No. No, no, no, no no. No.
>
> Also, no. Partial WAL files are not valid for recovery. In fact, I'm
> surprised the standby node even applied it at all.
>
>> We are seeing these problems in the newly promoted master now:
>>
>> 1. when we run queries on primary key, we don't get the rows even
>> if it exist in db. However if we force query not to use index, we get
>> those entries.
>> 2. there are duplicate values for primary keys
>
> This is no surprise. Your slave has partial data commits, which means
> your table, index, or possibly both, are corrupt.
>
> The first thing you need to do is back up any tables you've noticed
> are having this behavior. Second, try to reindex the tables that are
> having problems. The errors you are seeing are due to the data and
> indexes being out of sync. If you get an error that says the reindex
> fails due to duplicate values, you can do this:
>
> SET enable_indexscan TO false;
> SET enable_bitmapscan TO false;
>
> SELECT primary_key, count(1)
> FROM broken_table
> GROUP BY 1
> HAVING count(1) > 1;
>
> For any ID that comes back, do this:
>
> SELECT ctid, *
> FROM broken_table
> WHERE primary_key = [value(s) from above];
>
> Then you need to delete one of the bad rows after deciding which. Use
> the CTID of the row you want to delete:
>
> DELETE FROM broken_table
> WHERE ctid = 'whatever';
>
> Then reindex the table so the correct values are properly covered.
> Doing this for all of your corrupt tables may take a while depending
> on how many there are.
>
From | Date | Subject | |
---|---|---|---|
Next Message | Shaun Thomas | 2014-06-26 14:49:31 | Re: Fast data, slow data |
Previous Message | Merlin Moncure | 2014-06-26 13:20:18 | Re: Alternative to psql -c ? |