Re: xid wraparound danger due to INDEX_CLEANUP false

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: xid wraparound danger due to INDEX_CLEANUP false
Date: 2020-11-20 03:56:18
Message-ID: CAH2-WznhCnL5gtWXG-YbtMUxR8kzcXcWWOVSbNpw3nYG4TwsvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 19, 2020 at 5:58 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> Several months passed from the discussion. We decided not to do
> anything on back branches but IIUC the fundamental issue is not fixed
> yet. The issue pointed out by Andres that we should leave the index AM
> to decide whether doing vacuum cleanup or not when INDEX_CLEANUP is
> specified is still valid. Is that right?

I don't remember if Andres actually said that publicly, but I
definitely did. I do remember discussing this with him privately, at
which point he agreed with what I said. Which you just summarized
well.

> For HEAD, there was a discussion that we change lazy vacuum and
> bulkdelete and vacuumcleanup APIs so that it calls these APIs even
> when INDEX_CLEANUP is specified. That is, when INDEX_CLEANUP false is
> specified, it collects dead tuple TIDs into maintenance_work_mem space
> and passes the flag indicating INDEX_CLEANUP is specified or not to
> index AMs.

Right.

> Index AM decides whether doing bulkdelete/vacuumcleanup. A
> downside of this idea would be that we will end up using
> maintenance_work_mem even if all index AMs of the table don't do
> bulkdelete/vacuumcleanup at all.

That is a downside, but I don't think that it's a serious downside.
But it may not matter, because there are lots of reasons to move in
this direction.

> The second idea I came up with is to add an index AM API (say,
> amcanskipindexcleanup = true/false) telling index cleanup is skippable
> or not. Lazy vacuum checks this flag for each index on the table
> before starting. If index cleanup is skippable in all indexes, it can
> choose one-pass vacuum, meaning no need to collect dead tuple TIDs in
> maintenance_work_mem. All in-core index AM will set to true. Perhaps
> it’s true (skippable) by default for backward compatibility.

(The terminology here is very confusing, because the goal of the
INDEX_CLEANUP feature in v12 is not really to skip a call to
btvacuumcleanup(). The goal is really to skip a call to
btbulkdelete(). I will try to be careful.)

I think that the ideal design here would be a new hybrid of two
existing features:

1.) Your INDEX_CLEANUP feature from Postgres 12.

and:

2.) Your vacuum_cleanup_index_scale_factor feature from Postgres 11.

The INDEX_CLEANUP feature is very useful, because skipping indexes
entirely can be very helpful for many reasons (e.g. faster manual
VACUUM in the event of wraparound related emergencies). But
INDEX_CLEANUP has 2 problems today:

A. It doesn't interact well with vacuum_cleanup_index_scale_factor.
This is the problem that has been discussed on this thread.

and:

B. It is an "all or nothing" thing. Unlike the
vacuum_cleanup_index_scale_factor feature, it does not care about what
the index AM/individual index wants. But it should.

(**Thinks some more***)

Actually, on second thought, maybe INDEX_CLEANUP only has one problem.
Problem A is actually just a special case of problem B. There are many
interesting opportunities created by solving problem B
comprehensively.

So, what useful enhancements to VACUUM are possible once we have
something like INDEX_CLEANUP, that is sensitive to the needs of
indexes? Well, you already identified one yourself, so obviously
you're thinking about this in a similar way already:

> The in-core AMs including btree indexes will work same as before. This
> fix is to make it more desirable behavior and possibly to help other
> AMs that require to call vacuumcleanup in all cases. Once we fix it I
> wonder if we can disable index cleanup when autovacuum’s
> anti-wraparound vacuum.

Obviously this is a good idea. The fact that anti-wraparound vacuum
isn't really special compared to regular autovacuum is *bad*.
Obviously anti-wraparound is in some sense more important than regular
vacuum. Making it as similar as possible to vacuum simply isn't
helpful. Maybe it is slightly more elegant in theory, but in the real
world it is a poor design. (See also: every single PostgreSQL post
mortem that has ever been written.)

But why stop with this? There are other big advantages to allowing
individual indexes/index AMs influence of the INDEX_CLEANUP behavior.
Especially if they're sensitive to the needs of particular indexes on
a table (not just all of the indexes on the table taken together).

As you may know, my bottom-up index deletion patch can more or less
eliminate index bloat in indexes that don't get "logically changed" by
many non-HOT updates. It's *very* effective with non-HOT updates and
lots of indexes. See this benchmark result for a recent example:

https://postgr.es/m/CAGnEbohYF_K6b0v=2uc289=v67qNhc3n01Ftic8X94zP7kKqtw@mail.gmail.com

The feature is effective enough to make it almost unnecessary to
VACUUM certain indexes -- though not necessarily other indexes on the
same table. Of course, in the long term it will eventually be
necessary to really vacuum these indexes. Not because the indexes
themselves care, though -- they really don't (if they don't receive
logical changes from non-HOT updates, and so benefit very well from
the proposed bottom-up index deletion mechanism, they really have no
selfish reason to care if they ever get vacuumed by autovacuum).

The reason we eventually need to call ambulkdelete() with these
indexes (with all indexes, actually) even with these enhancements is
related to the heap. We eventually want to make LP_DEAD line pointers
in the heap LP_UNUSED. But we should be lazy about it, and wait until
it becomes a real problem. Maybe we can only do a traditional VACUUM
(with a second pass of the heap for heap LP_UNUSED stuff) much much
less frequently than today. At the same time, we can set the FSM for
index-only scans much more frequently.

It's also important that we really make index vacuuming a per-index
thing. You can see this in the example benchmark I linked to, which
was posted by Victor: no page splits in one never-logically-modified
index, and some page splits in other indexes that were actually
changed by UPDATEs again and again. Clearly you can have several
different indexes on the same table that have very different needs.

With some indexes we want to be extra lazy (these are indexes that
make good use of bottom-up deletion). But with other indexes on the
same table we want to be eager. Maybe even very eager. If we can make
per-index decisions, then every individual part of the system works
well. Currently, the problem with autovacuum scheduling is that it
probably makes sense for the heap with the defaults (or something like
them), and probably doesn't make any sense for indexes (though it also
varies among indexes). So today we have maybe 7 different things
(maybe 6 indexes + 1 table), and we pretend that they are only one
thing. It's just a fantasy. The reality is that we have 7 things that
have only a very loose and complicated relationship to each other. We
need to stop believing in this fantasy, and start paying attention to
the more complicated reality. The only way to do that is ask each
index directly, while being prepared to get very different answers
from each index on the same table.

Here is what I mean by that: it would also probably be very useful to
do something like a ambulkdelete() call for only a subset of indexes
that really need it. So you aggressively vacuum the one index that
really does get logically modified by an UPDATE, and not the other 6
that don't. (Of course it's still true that we cannot have a second
heap pass to make LP_DEAD line pointers in the heap LP_UNUSED --
obviously that's unsafe unless we're 100% sure that nothing in any
index points to the now-LP_UNUSED line pointer. But many big
improvements are possible without violating this basic invariant.)

If you are able to pursue this project, in whole or in part, I would
definitely be supportive of that. I may be able to commit it. I think
that this project has many benefits, not just one or two. It seems
strategic.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-11-20 04:02:13 Re: [PATCH 1/1] Fix compilation on mac with Xcode >= 11.4.
Previous Message Masahiko Sawada 2020-11-20 03:53:54 Re: VACUUM (DISABLE_PAGE_SKIPPING on)