From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: B-tree page deletion boundary cases |
Date: | 2012-04-24 08:08:36 |
Message-ID: | CA+U5nM+Ra=3X269jC6x4fr4dVYj=rBAOS=y9FwO_kmhBuh5eCA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Apr 21, 2012 at 5:52 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> As I mentioned[1] peripherally back in November, that algorithm has been
> insufficient since the introduction of non-XID-bearing transactions in
> PostgreSQL 8.3. Such transactions do not restrain RecentXmin. If no running
> transaction has an XID, RecentXmin == ReadNewTransactionId() and the page
> incorrectly becomes available for immediate reuse.
Good observation.
> The fix is to compare the stored XID to RecentGlobalXmin, not RecentXmin. We
> already use RecentGlobalXmin when wal_level = hot_standby. If no running
> transaction has an XID and all running transactions began since the last
> transaction that did bear an XID, RecentGlobalXmin == ReadNewTransactionId().
> Therefore, the correct test is btpo.xact < RecentGlobalXmin, not btpo.xact <=
> RecentGlobalXmin as we have today. This also cleanly removes the need for the
> bit of code in _bt_getbuf() that decrements btpo.xact before sending it down
> for ResolveRecoveryConflictWithSnapshot(). I suggested[2] that decrement on
> an unprincipled basis; it was just masking the off-by-one of using "<=
> RecentGlobalXmin" instead of "< RecentGlobalXmin" in _bt_page_recyclable().
Looks like the right fix. I'll apply this to 9.0/9.1/HEAD.
> This change makes empty B-tree pages wait through two generations of running
> transactions before reuse, so some additional bloat will arise.
Could arise, in some circumstances. But that assumes VACUUMs are
fairly frequent and that they would be delayed/rendered less effective
by this, which I don't think will be the case.
I note that we don't take any account of the number of pages that may
be reused when we VACUUM, so when HOT avoids a VACUUM we may
accumulate pages for a longer period. Looks like there is more work to
be done yet in cleaning indexes.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Meskes | 2012-04-24 08:17:14 | Re: ECPG FETCH readahead |
Previous Message | Boszormenyi Zoltan | 2012-04-24 08:04:41 | Re: PL/PGSQL bug in handling composite types |