Re: Teaching users how they can get the most out of HOT in Postgres 14

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Teaching users how they can get the most out of HOT in Postgres 14
Date: 2021-06-15 00:23:21
Message-ID: YMfy+dcrcdk7EUSm@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 11, 2021 at 02:46:20PM -0700, Peter Geoghegan wrote:
> On Thu, Jun 3, 2021 at 11:15 PM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>> I have read through the patch, and I am surprised to see that this
>> only makes possible to control the optimization at relation level.
>> The origin of the complaints is that this index cleanup optimization
>> has been introduced as a new rule that gets enforced at *system*
>> level, so I think that we should have an equivalent with a GUC to
>> control the behavior for the whole system.
>
> *Why* does it have to work at the system level? I don't understand
> what you mean about the system level.

I mean that you lack a GUC that allows to enforce to *not* use this
optimization for all relations, for all processes.

> As Masahiko pointed out, adding a GUC isn't what we've done in other
> similar cases -- that's how DISABLE_PAGE_SKIPPING works, which was a
> defensive option that seems similar enough to what we want to add now.
> To give another example, the TRUNCATE VACUUM option (or the related
> reloption) can be used to disable relation truncation, a behavior that
> sometimes causes *big* issues in production. The truncate behavior is
> determined dynamically in most situations -- which is another
> similarity to the optimization we've added here.

> Why is this fundamentally different to those two things?

Because the situation looks completely different to me here. TRUNCATE
is thought as a option to be able to avoid an exclusive lock when
truncating the relation file size at the end of VACUUM. More
importantly the default of TRUNCATE is *false*, meaning that we are
never going to skip the truncation unless one specifies it at the
relation level.

Here, what we have is a decision that is enforced to happen by
default, all the time, with the user not knowing about it. If there
is a bug of an issue with it, users, based on your proposal, would be
forced to change it for each *relation*. If they miss some of those
relations, they may still run into problems without knowing about it.
The change of default behavior and having no way to control it in
a simple way look incompatible to me.

>> With what you are
>> presenting here, one could only disable the optimization for each
>> relation, one-by-one. If this optimization proves to be a problem,
>> it's just going to be harder to users to go through all the relations
>> and re-tune autovacuum. Am I missing something?
>
> Why would you expect autovacuum to run even when the optimization is
> unavailable (e.g. with Postgres 13)? After all, the specifics of when
> the bypass optimization kicks in make it very unlikely that ANALYZE
> will ever be able to notice enough dead tuples to trigger an
> autovacuum (barring antiwraparound and insert-driven autovacuums).
>
> There will probably be very few LP_DEAD items remaining. Occasionally
> there will be somewhat more LP_DEAD items, that happen to be
> concentrated in less than 2% of the table's blocks -- but block-based
> sampling by ANALYZE is likely to miss most of them and underestimate
> the total number. The sampling approach taken by acquire_sample_rows()
> ensures this with larger tables. With small tables the chances of the
> optimization kicking in are very low, unless perhaps fillfactor has
> been tuned very aggressively.
>
> There has never been a guarantee that autovacuum will be triggered
> (and do index vacuuming) in cases that have very few LP_DEAD items, no
> matter how the system has been tuned. The main reason why making the
> optimization behavior controllable is for the VACUUM command.
> Principally for hackers. I can imagine myself using the VACUUM option
> to disable the optimization when I was interested in testing VACUUM or
> space utilization in some specific, narrow way.
>
> Of course it's true that there is still some uncertainty about the
> optimization harming production workloads -- that is to be expected
> with an enhancement like this one. But there is still no actual
> example or test case that shows the optimization doing the wrong
> thing, or anything like it. Anything is possible, but I am not
> expecting there to be even one user complaint about the feature.
> Naturally I don't want to add something as heavyweight as a GUC, given
> all that.

Perhaps. What I am really scared about is that you are assuming that
enforcing this decision will be *always* fine. What I am trying to
say here is that it *may not* be fine for everybody, and that there
should be an easy way to turn it off if that proves to be a problem.
I don't quite see how that's an implementation problem, we have
already many reloptions that are controlled with GUCs if the
reloptions have no default.

I think that a more careful choice implementation would have been to
turn this optimization off by default, while having an option to allow
one to turn it on at will.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2021-06-15 00:40:40 Re: A new function to wait for the backend exit after termination
Previous Message Ranier Vilela 2021-06-15 00:01:19 Re: Signed vs Unsigned (take 2) (src/backend/storage/ipc/procarray.c)