Re: Index bloat and REINDEX/VACUUM optimization for partial index

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
Cc: jayaprabhakar k <jayaprabhakar(at)gmail(dot)com>, psql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Index bloat and REINDEX/VACUUM optimization for partial index
Date: 2023-09-01 03:18:12
Message-ID: CAMkU=1xKJHOQen=kRDWcXoQ-ZMcqQm0F6uYfJKAV47XMFP-pyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Thu, Aug 31, 2023 at 11:06 AM Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> wrote:

> With the default value of autovacuum_vacuum_scale_factor (The default is
> 0.2 (20% of table size).) index will collect like 100M outdated/dead index
> entries before autovacuum kicks in and cleans them all (in a worst case),
> and of course it will lead to huge index bloat and awful performance.
>

Index bloat doesn't automatically lead to awful performance. There must be
some additional factor at play.

> Even if you scale down autovacuum_vacuum_scale_factor to some
> unreasonable low value like 0.01, the index still bloats to the 5M dead
> entries before autovacuum run, and constant vacuuming of a huge 500M table
> will put a huge load on the database server.
>

For this type of situation, I would generally set
autovacuum_vacuum_scale_factor to 0, and use autovacuum_vacuum_threshold to
drive the vacuuming instead. But I'd make those changes just on the queue
table(s), not system wide. Due to the visibility map, the load on the
server does not need to be huge just due to the table, as the stable part
of the table can be ignored. The problem is that each index still needs to
be read entirely for each vacuum cycle, which would not be much of a
problem for the partial indexes, but certainly could be for the full
indexes. There are some very recent improvements in this area, but I don't
think they can be applied selectively to specific indexes.

>
> Unfortunately there is no easy way out of this situation from database
> side, in general I recommend not trying to implement a fast pacing queue
> like load inside of a huge and constantly growing table, it never works
> well because you cannot keep up partial efficient indexes for the queue in
> a clean/non-bloated state.
>
> In my opinion the best solution is to keep list of entries to process ("*around
> 1000-1500 tasks in pending statuses")* duplicated in the separate tiny
> table (via triggers or implement it on the application level), in that case
> autovacuum will be able quickly clean dead entries from the index.
>

You should be able to use declarative partitioning to separate the "final"
tuples from the "active" tuples, to get the same benefit but with less work.

Cheers,

Jeff

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Rondat Flyag 2023-09-01 13:41:13 Re: Queries containing ORDER BY and LIMIT started to work slowly
Previous Message Jeff Janes 2023-09-01 03:01:06 Re: Index bloat and REINDEX/VACUUM optimization for partial index