From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: New IndexAM API controlling index vacuum strategies |
Date: | 2021-02-10 02:14:13 |
Message-ID: | CAD21AoAvh=qrYMmBjC1L31a4rP5qSup31uZgCnXojOSE4RJ_Mw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sat, Feb 6, 2021 at 5:02 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Wed, Feb 3, 2021 at 8:18 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > I'm starting to think that the right short term goal should not
> > > directly involve bottom-up index deletion. We should instead return to
> > > the idea of "unifying" the vacuum_cleanup_index_scale_factor feature
> > > with the INDEX_CLEANUP feature, which is kind of where this whole idea
> > > started out at. This short term goal is much more than mere
> > > refactoring. It is still a whole new user-visible feature. The patch
> > > would teach VACUUM to skip doing any real index work within both
> > > ambulkdelete() and amvacuumcleanup() in many important cases.
> > >
> >
> > I agree to cut the scope. I've also been thinking about the impact of
> > this patch on users.
>
> It's probably also true that on balance users care more about the
> "~99.9% append-only table" case (as well as the HOT updates workload I
> brought up in response to Victor on February 2) than making VACUUM
> very sensitive to how well bottom-up deletion is working. Yes, it's
> annoying that VACUUM still wastes effort on indexes where bottom-up
> deletion alone can do all required garbage collection. But that's not
> going to be a huge surprise to users. Whereas the "~99.9% append-only
> table" case causes huge surprises to users -- users hate this kind of
> thing.
Agreed.
>
> > If vacuum skips both ambulkdelete and amvacuumcleanup in that case,
> > I'm concerned that this could increase users who are affected by the
> > known issue of leaking deleted pages. Currently, users who are
> > affected by that is only users who use INDEX_CLEANUP off. But if we
> > enable this feature by default, all users potentially are affected by
> > that issue.
>
> FWIW I think that it's unfair to blame INDEX_CLEANUP for any problems
> in this area. The truth is that the design of the
> deleted-page-recycling stuff has always caused leaked pages, even with
> workloads that should not be challenging to the implementation in any
> way. See my later remarks.
>
> > To improve index tuple deletion in that case, skipping bulkdelete is
> > also a good idea whereas the retail index deletion is also a good
> > solution. I have thought the retail index deletion would be
> > appropriate to this case but since some index AM cannot support it
> > skipping index scans is a good solution anyway.
>
> The big problem with retail index tuple deletion is that it is not
> possible once heap pruning takes place (opportunistic pruning, or
> pruning performed by VACUUM). Pruning will destroy the information
> that retail deletion needs to find the index tuple (the column
> values).
Right.
>
> I think that we probably will end up using retail index tuple
> deletion, but it will only be one strategy among several. We'll never
> be able to rely on it, even within nbtree. My personal opinion is that
> completely eliminating VACUUM is not a useful long term goal.
Totally agreed. We are not able to rely on it. It would be a good way
to delete small amount index garbage tuples but the usage is limited.
>
> > > Here is how the triggering criteria could work: maybe skipping
> > > accessing all indexes during VACUUM happens when less than 1% or
> > > 10,000 of the items from the table are to be removed by VACUUM --
> > > whichever is greater. Of course this is just the first thing I thought
> > > of. It's a starting point for further discussion.
> >
> > I also prefer your second idea :)
>
> Cool. Yeah, I always like it when the potential downside of a design
> is obviously low, and the potential upside is obviously very high. I
> am much less concerned about any uncertainty around when and how users
> will get the big upside. I like it when my problems are "luxury
> problems". :-)
>
> > As I mentioned above, I'm still concerned that the extent of users who
> > are affected by the issue of leaking deleted pages could get expanded.
> > Currently, we don't have a way to detect how many index pages are
> > leaked. If there are potential cases where many deleted pages are
> > leaked this feature would make things worse.
>
> The existing problems here were significant even before you added
> INDEX_CLEANUP. For example, let's say VACUUM deletes a page, and then
> later recycles it in the normal/correct way -- this is the simplest
> possible case for page deletion. The page is now in the FSM, ready to
> be recycled during the next page split. Or is it? Even in this case
> there are no guarantees! This is because _bt_getbuf() does not fully
> trust the FSM to give it a 100% recycle-safe page for its page split
> caller -- _bt_getbuf() checks the page using _bt_page_recyclable()
> (which is the same check that VACUUM does to decide a deleted page is
> now recyclable). Obviously this means that the FSM can "leak" a page,
> just because there happened to be no page splits before wraparound
> occurred (and so now _bt_page_recyclable() thinks the very old page is
> very new/in the future).
>
> In general the recycling stuff feels ridiculously over engineered to
> me. It is true that page deletion is intrinsically complicated, and is
> worth having -- that makes sense to me. But the complexity of the
> recycling stuff seems ridiculous.
>
> There is only one sensible solution: follow the example of commit
> 6655a7299d8 in nbtree. This commit fully fixed exactly the same
> problem in GiST by storing an epoch alongside the XID. This nbtree fix
> is even anticipated by the commit message of 6655a7299d8. I can take
> care of this myself for Postgres 14.
Thanks. I think that's very good if we resolve this recycling stuff
first then try the new approach to skip index vacuum in more cases.
That way, even if the vacuum strategy stuff took a very long time to
get committed over several major versions, we would not be affected by
deleted nbtree page recycling problem (at least for built-in index
AMs). Also, the approach of 6655a7299d8 itself is a good improvement
and seems straightforward to me.
>
> > I agreed to cut the scope for PG14. It would be good if we could
> > improve index vacuum while cutting cut the scope for PG14 and not
> > expanding the extent of the impact of this issue.
>
> Great! Well, if I take care of the _bt_page_recyclable()
> wraparound/epoch issue in a general kind of way then AFAICT there is
> no added risk.
Agreed!
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2021-02-10 02:32:17 | Re: repeated decoding of prepared transactions |
Previous Message | Peter Smith | 2021-02-10 02:11:42 | Re: Single transaction in the tablesync worker? |