Re: do only critical work during single-user vacuum?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: do only critical work during single-user vacuum?
Date: 2022-02-03 18:05:50
Message-ID: CA+TgmoYPfofQmRtUan=A3aWE9wFsJaOFr+W_ys2pPkNPr-2FZw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 9, 2021 at 8:56 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I think we should move *away* from single user mode, rather than the
> opposite. It's a substantial code burden and it's hard to use.

Yes. This thread seems to be largely devoted to the topic of making
single-user vacuum work better, but I don't see anyone asking the
question "why do we have a message that tells people to vacuum in
single user mode in the first place?". It's basically bad advice, with
one small exception that I'll talk about in a minute. Suppose we had a
message in the tree that said "HINT: Consider angering a live anaconda
to fix this problem." If that were so, the correct thing to do
wouldn't be to add a section to our documentation explaining how to
deal with angry anacondas. The correct thing to do would be to remove
the hint as bad advice that we never should have offered in the first
place. And so here. We should not try to make vacuum in single
user-mode work better or differently, or at least that shouldn't be
our primary objective. We should just stop telling people to do it. We
should probably add messages and documentation *discouraging* the use
of single user mode for recovering from wraparound trouble, exactly
the opposite of what we do now. There's nothing we can do in
single-user mode that we can't do equally well in multi-user mode. If
people try to fix wraparound problems in multi-user mode, they still
have read-only access to their database, they can use parallelism,
they can use command line utilities like vacuumdb, and they can use
psql which has line editing and allows remote access and is a way
nicer user experience than running postgres --single. We need a really
compelling reason to tell people to give up all those advantages, and
there is no such reason. It makes just as much sense as telling people
to deal with wraparound problems by angering a live anaconda.

I did say there was an exception, and it's this: the last time I
studied this issue back in 2019,[1] vacuum insisted on trying to
truncate tables even when the system is in wraparound danger. Then it
would fail, because truncating the table required allocating an XID,
which would fail if we were short on XIDs. By putting the system in
single user mode, you could continue to allocate XIDs and thus VACUUM
would work. However, if you think about this for even 10 seconds, you
can see that it's terrible. If we're so short of XIDs that we are
scared to allocate them for fear of causing an actual wraparound,
putting the system into a mode where that protection is bypassed is a
super-terrible idea. People will be able to run vacuum, yes, but if
they have too many tables, they will actually experience wraparound
and thus data loss before they process all the tables they have. What
we ought to do to solve this problem is NOT TRUNCATE when the number
of remaining XIDs is small, so that we don't consume any of the
remaining XIDs until we get the system out of wraparound danger. I
think the "failsafe" stuff Peter added in v14 fixes that, though. If
not, we should adjust it so it does. And then we should KILL WITH FIRE
the message telling people to use single user mode -- and once we do
that, the question of what the behavior ought to be when someone does
run VACUUM in single user mode becomes a lot less important.

This problem is basically self-inflicted. We have given people bad
advice (use single user mode) and then they suffer when they take it.
Ameliorating the suffering isn't the worst idea ever, but it's
basically fixing the wrong problem.

--
Robert Haas
EDB: http://www.enterprisedb.com

[1] http://postgr.es/m/CA+Tgmob1QCMJrHwRBK8HZtGsr+6cJANRQw2mEgJ9e=D+z7cOsw@mail.gmail.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-02-03 18:10:57 Re: Server-side base backup: why superuser, not pg_write_server_files?
Previous Message Zhihong Yu 2022-02-03 17:51:52 Re: Implementing Incremental View Maintenance