Re: filesystem full during vacuum - space recovery issues

From: Thomas Simpson <ts(at)talentstack(dot)to>
To: pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: filesystem full during vacuum - space recovery issues
Date: 2024-07-18 13:54:07
Message-ID: a17b5918-dd6e-4e02-a31a-c75a6fc7c697@talentstack.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin pgsql-hackers

Thanks Ron for the suggestions - I applied some of the settings which
helped throughput a little bit but were not an ideal solution for me -
let me explain.

Due to the size, I do not have the option to use the directory mode (or
anything that uses disk space) for dump as that creates multiple
directories (hence why it can do multiple jobs).  I do not have the
several hundred TB of space to hold the output and there is no practical
way to get it, especially for a transient reload.

I have my original server plus my replica; as the replica also applied
the WALs, it too filled up and went down.  I've basically recreated this
as a primary server and am using a pipeline to dump from the original
into this as I know that has enough space for the final loaded database
and should have space left over from the clean rebuild (whereas the
original server still has space exhausted due to the leftover files).

Incidentally, this state is also why going to a backup is not helpful
either as the restore and then re-apply the WALs would just end up
filling the disk and recreating the original problem.

Even with the improved throughput, current calculations are pointing to
almost 30 days to recreate the database through dump and reload which is
a pretty horrible state to be in.

I think this is perhaps an area of improvement - especially as larger
PostgreSQL databases become more common, I'm not the only person who
could face this issue.

Perhaps an additional dumpall mode that generates multiple output pipes
(I'm piping via netcat to the other server) - it would need to combine
with a multiple listening streams too and some degree of
ordering/feedback to get to the essentially serialized output from the
current dumpall.  But this feels like PostgreSQL expert developer territory.

Thanks

Tom

On 17-Jul-2024 09:49, Ron Johnson wrote:
> On Wed, Jul 17, 2024 at 9:26 AM Thomas Simpson <ts(at)talentstack(dot)to> wrote:
---8<--snip,snip---8<---
> That would, of course, depend on what you're currently doing. 
> pg_dumpall of a Big Database is certainly suboptimal compared to
> "pg_dump -Fd --jobs=24".
>
> This is what I run (which I got mostly from a databasesoup.com
> <http://databasesoup.com> blog post) on the target instance before
> doing "pg_restore -Fd --jobs=24":
> declare -i CheckPoint=30
> declare -i SharedBuffs=32
> declare -i MaintMem=3
> declare -i MaxWalSize=36
> declare -i WalBuffs=64
> pg_ctl restart -wt$TimeOut -mfast \
>         -o "-c hba_file=$PGDATA/pg_hba_maintmode.conf" \
>         -o "-c fsync=off" \
>         -o "-c log_statement=none" \
>         -o "-c log_temp_files=100kB" \
>         -o "-c log_checkpoints=on" \
>         -o "-c log_min_duration_statement=120000" \
>         -o "-c shared_buffers=${SharedBuffs}GB" \
>         -o "-c maintenance_work_mem=${MaintMem}GB" \
>         -o "-c synchronous_commit=off" \
>         -o "-c archive_mode=off" \
>         -o "-c full_page_writes=off" \
>         -o "-c checkpoint_timeout=${CheckPoint}min" \
>         -o "-c max_wal_size=${MaxWalSize}GB" \
>         -o "-c wal_level=minimal" \
>         -o "-c max_wal_senders=0" \
>         -o "-c wal_buffers=${WalBuffs}MB" \
>         -o "-c autovacuum=off"
>
> After the pg_restore -Fd --jobs=24 and vacuumdb --analyze-only --jobs=24:
> pg_ctl stop -wt$TimeOut && pg_ctl start -wt$TimeOut
>
> Of course, these parameter values were for *my* hardware.
>
> My backups were in progress when all the issues happened, so
> they're not such a good starting point and I'd actually prefer the
> clean reload since this DB has been through multiple upgrades
> (without reloads) until now so I know it's not especially clean. 
> The size has always prevented the full reload before but the
> database is relatively low traffic now so I can afford some time
> to reload, but ideally not 10 days.
>
>> Yours,
>> Laurenz Albe
>

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Ron Johnson 2024-07-18 15:16:05 Re: filesystem full during vacuum - space recovery issues
Previous Message Imran Khan 2024-07-18 12:17:31 Re: Error; pg_upgrade 10 to 15

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2024-07-18 14:00:15 Re: Built-in CTYPE provider
Previous Message Fujii Masao 2024-07-18 13:47:16 Re: Add a GUC check hook to ensure summarize_wal cannot be enabled when wal_level is minimal