From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: BUG #6425: Bus error in slot_deform_tuple |
Date: | 2012-02-04 16:11:43 |
Message-ID: | CA+U5nMJkaLowf=Vksbh30MBHMQdT2D65fwZfTWF6SQfbT8429A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs pgsql-hackers |
On Fri, Feb 3, 2012 at 6:45 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> I have not gotten very far with the coredump, except to observe that
>> gdb says the Assert ought to have passed: ...
>> This suggests very strongly that indeed the buffer was changing under
>> us.
>
> I probably ought to let the test case run overnight before concluding
> anything, but at this point it's run for two-plus hours with no errors
> after applying this patch:
>
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index cce87a3..b128bfd 100644
> *** a/src/backend/access/transam/xlog.c
> --- b/src/backend/access/transam/xlog.c
> *************** RestoreBkpBlocks(XLogRecPtr lsn, XLogRec
> *** 3716,3724 ****
> }
> else
> {
> - /* must zero-fill the hole */
> - MemSet((char *) page, 0, BLCKSZ);
> memcpy((char *) page, blk, bkpb.hole_offset);
> memcpy((char *) page + (bkpb.hole_offset + bkpb.hole_length),
> blk + bkpb.hole_offset,
> BLCKSZ - (bkpb.hole_offset + bkpb.hole_length));
> --- 3716,3724 ----
> }
> else
> {
> memcpy((char *) page, blk, bkpb.hole_offset);
> + /* must zero-fill the hole */
> + MemSet((char *) page + bkpb.hole_offset, 0, bkpb.hole_length);
> memcpy((char *) page + (bkpb.hole_offset + bkpb.hole_length),
> blk + bkpb.hole_offset,
> BLCKSZ - (bkpb.hole_offset + bkpb.hole_length));
>
>
> The existing code makes the page state transiently invalid (all zeroes)
> for no particularly good reason, and consumes useless cycles to do so,
> so this would be a good change in any case. The reason it is relevant
> to our current problem is that even though RestoreBkpBlocks faithfully
> takes exclusive lock on the buffer, *that is not enough to guarantee
> that no one else is touching that buffer*. Another backend that has
> already located a visible tuple on a page is entitled to keep accessing
> that tuple with only a buffer pin. So the existing code transiently
> wipes the data from underneath the other backend's pin.
>
> It's clear how this explains the symptoms
Yes, that looks like the murder weapon.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2012-02-04 18:37:40 | Re: [BUGS] BUG #6425: Bus error in slot_deform_tuple |
Previous Message | Bruce Momjian | 2012-02-03 19:32:31 | Re: BUG #6347: Reopening bug #6085 |
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Smith | 2012-02-04 16:59:42 | Re: basic pgbench runs with various performance-related patches |
Previous Message | Simon Riggs | 2012-02-04 16:05:15 | Re: Hot standby fails if any backend crashes |