From: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Write Ahead Logging for Hash Indexes |
Date: | 2016-08-24 17:02:28 |
Message-ID: | CAMkU=1wiPnV2d+5JauFoVMN=GkYKvRhjN9m8Huy_4+y1rw+LPQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 23, 2016 at 10:05 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> On Wed, Aug 24, 2016 at 2:37 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>
> >
> > After an intentionally created crash, I get an Assert triggering:
> >
> > TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] &
> > (1<<((bitmapbit)%32))))", File: "hashovfl.c", Line: 553)
> >
> > freep[0] is zero and bitmapbit is 16.
> >
>
> Here what is happening is that when it tries to clear the bitmapbit,
> it expects it to be set. Now, I think the reason for why it didn't
> find the bit as set could be that after the new overflow page is added
> and the bit corresponding to it is set, you might have crashed the
> system and the replay would not have set the bit. Then while freeing
> the overflow page it can hit the Assert as mentioned by you. I think
> the problem here could be that I am using REGBUF_STANDARD to log the
> bitmap page updates which seems to be causing the issue. As bitmap
> page doesn't follow the standard page layout, it would have omitted
> the actual contents while taking full page image and then during
> replay, it would not have set the bit, because page doesn't need REDO.
> I think here the fix is to use REGBUF_NO_IMAGE as we use for vm
> buffers.
>
> If you can send me the detailed steps for how you have produced the
> problem, then I can verify after fixing whether you are seeing the
> same problem or something else.
>
The test is rather awkward, it might be easier to just have me test it.
But, I've attached it.
There is a patch that needs to applied and compiled (alongside your
patches, of course), to inject the crashes. A perl script which creates
the schema and does the updates. And a shell script which sets up the
cluster with the appropriate parameters, and then calls the perl script in
a loop.
The top of the shell script has some hard coded paths to the binaries, and
to the test data directory (which is automatically deleted)
I run it like "sh do.sh >& do.err &"
It gives two different types of assertion failures:
$ fgrep TRAP: do.err |sort|uniq -c
21 TRAP: FailedAssertion("!(((freep)[(bitmapbit)/32] &
(1<<((bitmapbit)%32))))", File: "hashovfl.c", Line: 553)
32 TRAP: FailedAssertion("!(RefCountErrors == 0)", File: "bufmgr.c",
Line: 2506)
The second one is related to the intentional crashes, and so is not
relevant to you.
Cheers,
Jeff
Attachment | Content-Type | Size |
---|---|---|
count.pl | application/octet-stream | 8.5 KB |
crash_REL10.patch | application/octet-stream | 12.9 KB |
do.sh | application/x-sh | 4.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Emre Hasegeli | 2016-08-24 17:32:36 | Re: SP-GiST support for inet datatypes |
Previous Message | Robert Haas | 2016-08-24 16:48:19 | Re: "Some tests to cover hash_index" |