From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Avoiding pin scan during btree vacuum |
Date: | 2016-01-03 15:34:53 |
Message-ID: | CANP8+jKsmSLuPssH6ivuznDVnSotJxn0LMz+e8PsphsC0xyMtw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 21 December 2015 at 21:28, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:
> Simon Riggs wrote:
> > During VACUUM of btrees, we need to pin all pages, even those where
> tuples
> > are not removed, which I am calling here the "pin scan". This is
> especially
> > a problem during redo, where removing one tuple from a 100GB btree can
> take
> > a minute or more. That causes replication lags. Bad Thing.
>
> Agreed on there being a problem here.
>
> As a reminder, this "pin scan" is implemented in the
> HotStandbyActiveInReplay() block in btree_xlog_vacuum; this routine is
> in charge of replaying an action recorded by _bt_delitems_vacuum. That
> replay works by acquiring cleanup lock on all btree blocks from
> lastBlockVacuumed to "this one" (i.e. the one in which elements are
> being removed). In essence, this means pinning *all* the blocks in the
> index, which is bad.
> The new code sets lastBlockVacuumed to Invalid; a new test in the replay
> code makes that value not pin anything. This is supposed to optimize
> the case in question.
>
Nice summary, thanks.
> One thing that this patch should update is the comment above struct
> xl_btree_vacuum, which currently state that EnsureBlockUnpinned() is one
> of the options to apply to each block, but that routine doesn't exist.
>
Updated
> One possible naive optimization is to avoid locking pages that aren't
> leaf pages, but that would still require fetching the block from disk,
> so it wouldn't actually optimize anything other than the cleanup lock
> acquisition. (And probably not too useful, since leaf pages are said to
> be 95% - 99% of index pages anyway.)
>
Agreed, not worth it.
> Also of interest is the comment above the call to _bt_delitems_vacuum in
> btvacuumpage(); that needs an update too.
>
Updated
> It appears to me that your patch changes the call in btvacuumscan() but
> not the one in btvacuumpage(). I assume you should be changing both.
>
Yes, that was correct. Updated.
> I think the new comment that talks about Toast Index should explain
> *why* we can skip the pinning in all cases except that one, instead of
> just saying we can do it.
>
Greatly expanded comments.
Thanks for the review.
New patch attached.
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment | Content-Type | Size |
---|---|---|
avoid_standby_pin_scan.v2.patch | application/octet-stream | 5.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2016-01-03 15:40:01 | Re: Avoiding pin scan during btree vacuum |
Previous Message | Robert Haas | 2016-01-03 14:57:05 | Re: dynloader.h missing in prebuilt package for Windows? |