From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: regression test failed when enabling checksum |
Date: | 2013-04-02 03:02:20 |
Message-ID: | CAMkU=1x393o6hvJ8Bp0Pk+P4Ad-DdNedUov4cf-aNKssTsv+xg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Monday, April 1, 2013, Jeff Davis wrote:
> On Mon, 2013-04-01 at 10:37 -0700, Jeff Janes wrote:
>
> > Over 10,000 cycles of crash and recovery, I encountered two cases of
> > checksum failures after recovery, example:
> >
> >
> > 14264 SELECT 2013-03-28 13:08:38.980 PDT:WARNING: page verification
> > failed, calculated checksum 7017 but expected 1098
> > 14264 SELECT 2013-03-28 13:08:38.980 PDT:ERROR: invalid page in block
> > 77 of relation base/16384/2088965
> >
> > 14264 SELECT 2013-03-28 13:08:38.980 PDT:STATEMENT: select sum(count)
> > from foo
>
> It would be nice to know whether that's an index or a heap page.
>
It is a heap page for the table jjanes.public.foo.
>
> >
> > In both cases, the bad block (77 in this case) is the same block that
> > was intentionally partially-written during the "crash". However, that
> > block should have been restored from the WAL FPW, so its fragmented
> > nature should not have been present in order to be detected. Any idea
> > what is going on?
>
> Not right now. My primary suspect is what's going on in
> visibilitymap_set() and heap_xlog_visible(), which is more complex than
> some of the other code. That would require some VACUUM activity, which
> isn't in your workload -- do you think autovacuum may kick in sometimes?
>
Yes, a modification to my test harness that I failed to mention is that it
now sleeps for 2 minutes after every 100 rounds of crash/recovery,
specifically so that autovac has a chance to kick in and run to completion.
I made that change so as to avoid wrap-around shut-downs on long running
tests. However "foo" is truncated at the beginning of every test, so I
don't think this would be relevant to that table, as any poisoned fruits of
the autovac would be discarded with the truncation.
Cheers,
Jeff
From | Date | Subject | |
---|---|---|---|
Next Message | Jeff Janes | 2013-04-02 03:07:40 | Spin Lock sleep resolution |
Previous Message | Jeff Janes | 2013-04-02 02:51:19 | regression test failed when enabling checksum |