Re: Options for more aggressive space reclamation in vacuuming?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Options for more aggressive space reclamation in vacuuming?
Date: 2023-05-15 14:40:18
Message-ID: CAMkU=1w8KaEBLDJt4RorTdwvgRuFXiojrCN17TnvfQKVtDY+eg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On Sun, May 14, 2023 at 5:03 PM Wells Oliver <wells(dot)oliver(at)gmail(dot)com> wrote:

> I think our use-case here is nothing special: just very large tables that
> are deleted from and re-inserted with some frequency. They store monthly
> data and exist as partitions. As an example, July of 2022 (summer months
> are peak) had 1,630,558,336 rows. We delete from this and insert daily.
>

You delete and reinsert every row every day, or just some lesser number of
rows each day? In one giant transaction daily, or in a series of short
mostly non-overlapping transactions throughout the day?

> We rarely get new, relevant data once the month is over. The n_dead_tup
> from pg_stat_all_tables here was over 7m rows, and clearing that out gave
> us back nearly 50GB, and the file size estimate on this partition was ~200
> GB.
>

Are these sizes for the entire relation size, or just for the core table
(not TOAST, not indexes)? Index bloat is harder to prevent than table
bloat, but is easier to deal with after the fact (reindex concurrently)

> These tables get auto-vacuumed but clearly it's not, well,
> aggressively reclaiming space.
>

50 GB out of 200 GB seems roughly in accord with the default setting
of autovacuum_vacuum_scale_factor. So maybe just lowering that from 0.2
to, say, 0.05 would be good enough, either globally or just for these
tables.

>
> Given that they are partitions, VACUUM FULL locks the parent table, so new
> writes/deletes can't happen, and anyway people like to run queries against
> these tables quite frequently.
>

Since you rarely get new data once the month is over, you could lock the
partition against concurrent changes (but still allow reads) once it is no
longer "on the run" while you copy the data into a new tightly-packed table
and index that and add constraints. Then you need only a very brief access
exclusive lock while you detach/drop the old partition and attach the new
table in its place. Of course you can't drop the weaker lock while you
acquire the stronger one lest the old table change during the gap so lock
management can be a bit tedious, but it is surely less restrictive than a
VACUUM FULL.

Cheers,

Jeff

>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Rui DeSousa 2023-05-15 14:53:03 Re: how do I capture conflicting rows
Previous Message Mukesh Rajpurohit 2023-05-15 13:58:00 Re: how do I capture conflicting rows