Re: Options for more aggressive space reclamation in vacuuming?

From: Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
To: Ron <ronljohnsonjr(at)gmail(dot)com>
Cc: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Options for more aggressive space reclamation in vacuuming?
Date: 2023-05-15 17:05:36
Message-ID: CAOC+FBWuTfb_pBt8f5K=XJETebzG-qr4XKuMKwH3RNBdb1zXfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

The billion row table _is_ the monthly partition! We are in the process of
moving this scale of data to Redshift anyway, but that's a topic for
another mailing list. Just trying to nail down what I can in the interim. I
appreciate all the insights in this thread.

On Mon, May 15, 2023 at 10:04 AM Ron <ronljohnsonjr(at)gmail(dot)com> wrote:

> (A billion row table really should be partitioned. That ship sailed long
> ago, of course...)
>
> Sure you can "manually" adjust it. cron is your friend!!
>
> But... how long does it take to vacuum a freshly VACUUM FULLed table?
> Because I'm wondering if it would be more efficient to proactively keep the
> problem at bay by, for example, doing a plain VACUUM every... 2, 3, 4, 6
> (take your pick) hours, along with a plain VACUUM right after the monthly
> purge.
>
> On 5/15/23 11:26, Wells Oliver wrote:
>
> This brings to mind another question.. is there a clever solution for the
> situation where .001 vacuum factor makes sense for a table at the end of
> the month where it's reached its general capacity, e.g. maybe 1bn rows, but
> at the beginning of the month is too aggressive and would cause too
> frequent vacuuming given the significantly lower volume of records? Simple
> manual adjustment through a month, or something easier?
>
> On Mon, May 15, 2023 at 8:21 AM Wolfgang Wilhelm <
> wolfgang20121964(at)yahoo(dot)de> wrote:
>
>> Hi,
>>
>> what do you mean by "agressively reclaiming space"? Do you expect to see
>> more usable space in the file system? If this assumption is true you are
>> mistaken about the way autovacuum works. It deletes dead tuples _in_ the
>> blocks and frees space only when at least a certain amount of blocks at the
>> end of the file with tuple data are empty.
>>
>> Regards
>> WW
>>
>> Am Montag, 15. Mai 2023 um 16:40:47 MESZ hat Jeff Janes <
>> jeff(dot)janes(at)gmail(dot)com> Folgendes geschrieben:
>>
>>
>> On Sun, May 14, 2023 at 5:03 PM Wells Oliver <wells(dot)oliver(at)gmail(dot)com>
>> wrote:
>>
>> I think our use-case here is nothing special: just very large tables that
>> are deleted from and re-inserted with some frequency. They store monthly
>> data and exist as partitions. As an example, July of 2022 (summer months
>> are peak) had 1,630,558,336 rows. We delete from this and insert daily.
>>
>>
>> You delete and reinsert every row every day, or just some lesser number
>> of rows each day? In one giant transaction daily, or in a series of short
>> mostly non-overlapping transactions throughout the day?
>>
>>
>> We rarely get new, relevant data once the month is over. The n_dead_tup
>> from pg_stat_all_tables here was over 7m rows, and clearing that out gave
>> us back nearly 50GB, and the file size estimate on this partition was ~200
>> GB.
>>
>>
>> Are these sizes for the entire relation size, or just for the core table
>> (not TOAST, not indexes)? Index bloat is harder to prevent than table
>> bloat, but is easier to deal with after the fact (reindex concurrently)
>>
>>
>> These tables get auto-vacuumed but clearly it's not, well,
>> aggressively reclaiming space.
>>
>>
>> 50 GB out of 200 GB seems roughly in accord with the default setting
>> of autovacuum_vacuum_scale_factor. So maybe just lowering that from 0.2
>> to, say, 0.05 would be good enough, either globally or just for these
>> tables.
>>
>>
>>
>> Given that they are partitions, VACUUM FULL locks the parent table, so
>> new writes/deletes can't happen, and anyway people like to run queries
>> against these tables quite frequently.
>>
>>
>> Since you rarely get new data once the month is over, you could lock the
>> partition against concurrent changes (but still allow reads) once it is no
>> longer "on the run" while you copy the data into a new tightly-packed table
>> and index that and add constraints. Then you need only a very brief access
>> exclusive lock while you detach/drop the old partition and attach the new
>> table in its place. Of course you can't drop the weaker lock while you
>> acquire the stronger one lest the old table change during the gap so lock
>> management can be a bit tedious, but it is surely less restrictive than a
>> VACUUM FULL.
>>
>> Cheers,
>>
>> Jeff
>>
>>
>
> --
> Wells Oliver
> wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>
>
>
> --
> Born in Arizona, moved to Babylonia.
>

--
Wells Oliver
wells(dot)oliver(at)gmail(dot)com <wellsoliver(at)gmail(dot)com>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message ROHIT SACHDEVA 2023-05-16 05:19:55 Related To Vaccum Activity
Previous Message Ron 2023-05-15 17:03:59 Re: Options for more aggressive space reclamation in vacuuming?