Re: filesystem full during vacuum - space recovery issues

From: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
To: Pgsql-admin <pgsql-admin(at)lists(dot)postgresql(dot)org>
Subject: Re: filesystem full during vacuum - space recovery issues
Date: 2024-07-18 15:16:05
Message-ID: CANzqJaCyYDawnNx-EVV9ia4baB-jhs1iSYPcW6uE1hbG=c9R=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

There's no free lunch, and you can't squeeze blood from a turnip.

Single-threading will *ALWAYS* be slow: if you want speed, temporarily
throw more hardware at it: specifically another disk (and possibly more RAM
and CPU).

On Thu, Jul 18, 2024 at 9:55 AM Thomas Simpson <ts(at)talentstack(dot)to> wrote:

> Thanks Ron for the suggestions - I applied some of the settings which
> helped throughput a little bit but were not an ideal solution for me - let
> me explain.
>
> Due to the size, I do not have the option to use the directory mode (or
> anything that uses disk space) for dump as that creates multiple
> directories (hence why it can do multiple jobs). I do not have the several
> hundred TB of space to hold the output and there is no practical way to get
> it, especially for a transient reload.
>
> I have my original server plus my replica; as the replica also applied the
> WALs, it too filled up and went down. I've basically recreated this as a
> primary server and am using a pipeline to dump from the original into this
> as I know that has enough space for the final loaded database and should
> have space left over from the clean rebuild (whereas the original server
> still has space exhausted due to the leftover files).
>
> Incidentally, this state is also why going to a backup is not helpful
> either as the restore and then re-apply the WALs would just end up filling
> the disk and recreating the original problem.
>
> Even with the improved throughput, current calculations are pointing to
> almost 30 days to recreate the database through dump and reload which is a
> pretty horrible state to be in.
>
> I think this is perhaps an area of improvement - especially as larger
> PostgreSQL databases become more common, I'm not the only person who could
> face this issue.
>
> Perhaps an additional dumpall mode that generates multiple output pipes
> (I'm piping via netcat to the other server) - it would need to combine with
> a multiple listening streams too and some degree of ordering/feedback to
> get to the essentially serialized output from the current dumpall. But
> this feels like PostgreSQL expert developer territory.
>
> Thanks
>
> Tom
>
>
> On 17-Jul-2024 09:49, Ron Johnson wrote:
>
> On Wed, Jul 17, 2024 at 9:26 AM Thomas Simpson <ts(at)talentstack(dot)to> wrote:
>
> ---8<--snip,snip---8<---
>
> That would, of course, depend on what you're currently doing. pg_dumpall
> of a Big Database is certainly suboptimal compared to "pg_dump -Fd
> --jobs=24".
>
> This is what I run (which I got mostly from a databasesoup.com blog post)
> on the target instance before doing "pg_restore -Fd --jobs=24":
> declare -i CheckPoint=30
> declare -i SharedBuffs=32
> declare -i MaintMem=3
> declare -i MaxWalSize=36
> declare -i WalBuffs=64
> pg_ctl restart -wt$TimeOut -mfast \
> -o "-c hba_file=$PGDATA/pg_hba_maintmode.conf" \
> -o "-c fsync=off" \
> -o "-c log_statement=none" \
> -o "-c log_temp_files=100kB" \
> -o "-c log_checkpoints=on" \
> -o "-c log_min_duration_statement=120000" \
> -o "-c shared_buffers=${SharedBuffs}GB" \
> -o "-c maintenance_work_mem=${MaintMem}GB" \
> -o "-c synchronous_commit=off" \
> -o "-c archive_mode=off" \
> -o "-c full_page_writes=off" \
> -o "-c checkpoint_timeout=${CheckPoint}min" \
> -o "-c max_wal_size=${MaxWalSize}GB" \
> -o "-c wal_level=minimal" \
> -o "-c max_wal_senders=0" \
> -o "-c wal_buffers=${WalBuffs}MB" \
> -o "-c autovacuum=off"
>
> After the pg_restore -Fd --jobs=24 and vacuumdb --analyze-only --jobs=24:
> pg_ctl stop -wt$TimeOut && pg_ctl start -wt$TimeOut
>
> Of course, these parameter values were for *my* hardware.
>
>> My backups were in progress when all the issues happened, so they're not
>> such a good starting point and I'd actually prefer the clean reload since
>> this DB has been through multiple upgrades (without reloads) until now so I
>> know it's not especially clean. The size has always prevented the full
>> reload before but the database is relatively low traffic now so I can
>> afford some time to reload, but ideally not 10 days.
>>
>> Yours,
>> Laurenz Albe
>>
>>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Paul Smith* 2024-07-18 15:19:31 Re: filesystem full during vacuum - space recovery issues
Previous Message Thomas Simpson 2024-07-18 13:54:07 Re: filesystem full during vacuum - space recovery issues

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Smith* 2024-07-18 15:19:31 Re: filesystem full during vacuum - space recovery issues
Previous Message feichanghong 2024-07-18 15:04:42 Re: temp table on commit delete rows performance issue