From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Chris Angelico <rosuav(at)gmail(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Plug-pull testing worked, diskchecker.pl failed |
Date: | 2012-10-22 20:26:03 |
Message-ID: | CAMkU=1wcXdtMPDjXJX5nhOt8i=iGQB=z22FxDBPj-ms8Q-Y2FQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Mon, Oct 22, 2012 at 12:31 PM, Chris Angelico <rosuav(at)gmail(dot)com> wrote:
> On Tue, Oct 23, 2012 at 6:26 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> What did you do to look for corruption? That PosgreSQL succeeds at
>> going through crash-recovery and then starting up is not a good
>> indicator that there is no corruption.
>
> I fired up Postgres and looked at the logs for any signs of failure.
>
>> Did you do something like compute the aggregates on pgbench_history
>> and compare those aggregates to the balances in the other 3 tables?
>
> No, didn't do that. My next check will be done over the network
> (similar to diskchecker), with a script that fires off requests, waits
> for them to be confirmed committed, and then records a local copy, and
> will check that local copy once the server's back up again. That'll
> tell me if transactions are being lost.
If you like Perl, the count.pl from this message might be a useful
starting point:
http://archives.postgresql.org/pgsql-hackers/2012-02/msg01227.php
It was designed to check consistency after postmaster crashes, not OS
crashes, so the checker runs on the same host as postgres does.
Obviously for pull-the-plug test, you need run it on a different host;
so all the
DBI->connect(....)
calls need to be changed to do that.
> I'm kinda feeling my way in the dark here. Will check out the
> aggregates on pgbench_history when I get to work today; thanks for the
> tip!
Here's an example with pgbench_accounts, the other 2 should look analogous.
select aid, abalance, count(*) from (select aid,abalance from
pgbench_accounts union all select aid, sum(delta) from pgbench_history
group by aid) as foo group by aid, abalance having abalance!=0 and
count(*)!=2;
This should return zero rows. Any other result indicates corruption.
pgbench truncates pgbench_history, but does not reset the balances to
zero on the other tables. So if you want to run the test repeatedly,
you have to do pgbench -i between runs, or manually reset the balance
columns.
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Martijn van Oosterhout | 2012-10-22 20:34:20 | Re: Somewhat automated method of cleaning table of corrupt records for pg_dump |
Previous Message | Chris Angelico | 2012-10-22 19:31:29 | Re: Plug-pull testing worked, diskchecker.pl failed |