Re: index bloat estimation

From: Keith Fiske <keith(dot)fiske(at)crunchydata(dot)com>
To: Victor Sudakov <vas(at)sibptus(dot)ru>
Cc: Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: index bloat estimation
Date: 2021-02-17 00:06:39
Message-ID: CAODZiv5uVWASjuigUxJhmxmrQAdMoOxRu5vz97CkbuqH2-oxow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Sun, Feb 14, 2021 at 11:43 PM Victor Sudakov <vas(at)sibptus(dot)ru> wrote:

> Keith Fiske wrote:
> > >
> > > What queries do you use to estimate index and table bloat?
>
> [dd]
>
> > Why estimate when you can get the exact amount? At least for b-tree
> indexes
> > anyway.
> >
> > https://github.com/keithf4/pg_bloat_check
>
> Hello Keith,
>
> Thanks for this script and for reminding about pgstattuple.
>
> The script has the drawback of requiring r/w access to the database, so
> it cannot be run on a replica. But the idea is excellent. At least it
> can show the top N bloated relations, and then I can further explore with
> pgstattuple(), pgstatindex() etc.
>
>
> --
> Victor Sudakov, VAS4-RIPE, VAS47-RIPN
> 2:5005/49(at)fidonet http://vas.tomsk.ru/
>

You cannot run the actual scan on the replica, no. But it can be set to run
against the replica and just report the statistics so you can have a cron
set up to always run on the given system in case of failover. The
"--recovery_mode_norun" can be set so it will only run if the target system
is actually a primary.

I could possibly see about letting this actually run against the replica,
however this can be a rather long running transaction depending on the size
of the tables involved. You can set the "--commit_rate" to avoid some of
that, but if you have really large tables, it can still run quite long. So
this sort of check is really best run against the primary to avoid issues
around having to allow long running queries on the replica (delayed
replication or even worse bloat buildup).

However, once you get bloat under control, you likely shouldn't need to be
running this often, especially against the entire database. If a few tables
turn out to be problematic, you can make schedules just for them. And
otherwise run an entire database scan at most maybe once a month during
off-peak hours.

--
Keith Fiske
Senior Database Engineer
Crunchy Data - http://crunchydata.com

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Victor Sudakov 2021-02-17 02:27:24 Re: index bloat estimation
Previous Message Peter Eisentraut 2021-02-16 06:11:44 Re: pg_baseback could not connect in AWS linux 2