Re: index bloat estimation

From: Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>
To: Victor Sudakov <vas(at)sibptus(dot)ru>
Cc: Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: index bloat estimation
Date: 2021-02-17 15:12:38
Message-ID: CAODZiv69O=pTg7RMew=9Q1d6xKaVxriSSHZ661tx-YKhbuvmQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Tue, Feb 16, 2021 at 9:27 PM Victor Sudakov <vas(at)sibptus(dot)ru> wrote:

> Keith Fiske wrote:
>
> [dd]
> >
> > You cannot run the actual scan on the replica, no. But it can be set to
> run
> > against the replica and just report the statistics so you can have a cron
> > set up to always run on the given system in case of failover. The
> > "--recovery_mode_norun" can be set so it will only run if the target
> system
> > is actually a primary.
> >
> > I could possibly see about letting this actually run against the replica,
> > however this can be a rather long running transaction depending on the
> size
> > of the tables involved. You can set the "--commit_rate" to avoid some of
> > that, but if you have really large tables, it can still run quite long.
> So
> > this sort of check is really best run against the primary to avoid issues
> > around having to allow long running queries on the replica (delayed
> > replication or even worse bloat buildup).
>
> If you have a dedicated replica for OLAP, even one running from a WAL
> archive (not from a replication slot), this is not an issue.
>
> So running pg_bloat_check against a replica would be very useful for
> some of us.
>
> --
> Victor Sudakov, VAS4-RIPE, VAS47-RIPN
> 2:5005/49(at)fidonet http://vas.tomsk.ru/
>
>
>
Actually this could still be an issue, but all really depends on the size
of the tables involved. Long running transactions on the replica have the
potential to either delay replication entirely
(max_standby_archive_delay, max_standby_streaming_delay) or cause more
bloat than normal on the primary (hot_standby_feedback). The latter is more
often used to avoid the replication delay, but I have frequently seen
people push long running transactions that seem to be a "problem" onto the
replicas and don't realize that it doesn't solve all the problems of
actually running those queries on the primary. You still have a transaction
causing autovacuum to not be able to run efficiently.

The approximate/quick mode of pgstattuple could certainly help with this
problem, but as the issue you opened up on the github repo pointed out,
that skips over scanning toast tables (
https://github.com/keithf4/pg_bloat_check/issues/22) and also does not work
against indexes which are more often the problem with bloat and query
performance. I have seen significant bloat forming in the toast tables
(hundreds of GB) when the regular table only reports very minimal bloat. So
I don't recommend relying completely on the approximate check.

--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Rakesh T 2021-02-18 04:34:01 Aurora PostgreSQL Support
Previous Message Victor Sudakov 2021-02-17 02:27:24 Re: index bloat estimation