From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "Ideriha, Takeshi" <ideriha(dot)takeshi(at)jp(dot)fujitsu(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com> |
Subject: | Re: GUC for cleanup indexes threshold. |
Date: | 2017-03-03 16:13:17 |
Message-ID: | CAD21AoB1KqXEZh61b18q7cEjfj14e6RWdwvyjh_tYVRfsrp-Xw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Feb 25, 2017 at 7:10 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Fri, Feb 24, 2017 at 9:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I think this thread is pretty short on evidence that would let us make
>> a smart decision about what to do here. I see three possibilities.
>> The first is that this patch is a good idea whether we do something
>> about the issue of half-dead pages or not. The second is that this
>> patch is a good idea if we do something about the issue of half-dead
>> pages but a bad idea if we don't. The third is that this patch is a
>> bad idea whether or not we do anything about the issue of half-dead
>> pages.
>
> Half-dead pages are not really relevant to this discussion, AFAICT. I
> think that both you and Simon mean "recyclable" pages. There are
> several levels of indirection involved here, to keep the locking very
> granular, so it gets tricky to talk about.
>
> B-Tree page deletion is like a page split in reverse. It has a
> symmetry with page splits, which have two phases (atomic operations).
> There are also two phases for deletion, the first of which leaves the
> target page without a downlink in its parent, and marks it half dead.
> By the end of the first phase, there are still sibling pointers, so an
> index scan can land on them before the second phase of deletion begins
> -- they can visit a half-dead page before such time as the second
> phase of deletion begins, where the sibling link goes away. So, the
> sibling link isn't stale as such, but the page is still morally dead.
> (Second phase is where we remove even the sibling links, and declare
> it fully dead.)
>
> Even though there are two phases of deletion, the second still occurs
> immediately after the first within VACUUM. The need to have two phases
> is hard to explain, so I won't try, but it suffices to say that VACUUM
> does not actually ever leave a page half dead unless there is a hard
> crash.
>
> Recall that VACUUMing of a B-Tree is performed sequentially, so blocks
> can be recycled without needing to be found via a downlink or sibling
> link by VACUUM. What is at issue here, then, is VACUUM's degree of
> "eagerness" around putting *fully* dead B-Tree pages in the FSM for
> recycling. The interlock with RecentGlobalXmin is what makes it
> impossible for VACUUM to generally fully delete pages, *as well as*
> mark them as recyclable (put them in the FSM) all at once.
>
> Maybe you get this already, since, as I said, the terminology is
> tricky in this area, but I can't tell.
>
Thank you for clarification. Let me check my understanding. IIUC,
skipping second index vacuum path (lazy_cleanup_index) can not be
cause of leaving page as half-dead state but could leave recyclable
pages that are not marked as a recyclable. But second one, it can be
reclaimed by next index vacuum because btvacuumpage calls
RecordFreeIndexPage for recyclable page. Am I missing something?
My first motivation of this patch is to skip the second index vacuum
patch when vacuum skipped whole table by visibility map. But as Robert
suggested on another thread, I changed it to have a threshold. If my
understanding is correct, we can have a threshold that specifies the
fraction of the scanned pages by vacuum. If it's set 0.1,
lazy_scan_heap can do the second vacuum index only when 10% of table
is scanned. IOW, if 90% of table pages is skipped, which means almost
of table has not changed since previous vacuum, we can skip the second
index vacuum.
In this design, we could handle other types of AM as well.
Regards,
--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Masahiko Sawada | 2017-03-03 16:16:22 | Re: GUC for cleanup indexes threshold. |
Previous Message | Bernd Helmle | 2017-03-03 16:13:06 | Re: [patch] reorder tablespaces in basebackup tar stream for backup_label |