From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
---|---|
To: | Kevin Grittner <kgrittn(at)ymail(dot)com> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Race condition in b-tree page deletion |
Date: | 2014-01-27 14:34:44 |
Message-ID: | 52E66E84.2050109@vmware.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for the review!
On 01/18/2014 09:45 PM, Kevin Grittner wrote:
> The patch still applies with a few offsets. It compiles with these
> warnings (which were also reported by Peter Eisentraut and should
> be fixed):
>
> nbtpage.c: In function ‘_bt_pagedel’:
> nbtpage.c:1695:24: warning: ‘metad’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> nbtpage.c:1414:18: note: ‘metad’ was declared here
> nbtpage.c:1730:4: warning: ‘metapg’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> nbtpage.c:1413:8: note: ‘metapg’ was declared here
Fixed.
> I'm not a fan of the new retry label introduced in _bt_pagedel()
> and the goto statement which references it. I think a loop using
> while() or for() would be cleaner. The whole idea of describing
> this logic recursively and then making a big deal about using tail
> recursion as an optimization seems pointless and confusing. It
> could just as easily be described as an iterative approach and save
> the extra logical step in explaining the code. I think that would
> be clearer and easier to understand. It would also eliminate the
> last parameter, which is always passed as NULL and only set to
> something else internally. I think it would be cleaner to make
> that a local variable initialized to NULL as part of eliminating
> the fiction of recursion here.
Ok, changed it to a loop, and moved the responsibility to construct the
'stack' into _bt_pagedel().
> There is a point in this loop where we return 0, but I think that
> is wrong; I think we should break so that the pages already deleted
> will be counted.
Fixed.
> On a related note. the caller doesn't actually
> use the count, but assumes that any non-zero value should be
> treated as 1.
Hmm. The caller does this:
> ndel = _bt_pagedel(rel, buf);
>
> /* count only this page, else may double-count parent */
> if (ndel)
> stats->pages_deleted++;
The problem here is that btvacuumpage() might visit the parent page
again, and count it again. The code is erring on the side of
undercounting; I'm not sure which is better.
> Similar issues with faux recursion existing the calling function,
> btvacuumpage(). In that case the unnecessary parameter which the
> caller must get right is orig_blkno, which must be the same as
> blkno on an actual call. That could be a local variable
> initialized to blkno within the function. I think this would be
> better described as iteration directly rather than claiming
> recursions and whining about how the compiler is too dumb to turn
> it into iteration for us. It's not the job of this patch to fix
> that in the caller, but let's not emulate it.
Agreed.
> Other than that, I have not found any problems.
While fixing the above issues, and reviewing this again in general, I
noticed that btree_xlog_unlink_page() was a few bricks shy of a load.
When unlinking an internal page, it didn't update the pointer to the new
topmost page in the to-be-deleted chain. Fixed that.
Attached is a new version. I also re-worded the README changes; I hope
it reads better now.
- Heikki
Attachment | Content-Type | Size |
---|---|---|
fix-btree-page-deletion-3.patch | text/x-diff | 64.1 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Heikki Linnakangas | 2014-01-27 14:39:19 | Re: Race condition in b-tree page deletion |
Previous Message | Amit Langote | 2014-01-27 14:24:40 | Re: Retain dynamic shared memory segments for postmaster lifetime |