From: | Stephen Frost <sfrost(at)snowman(dot)net> |
---|---|
To: | Euler Taveira <euler(at)eulerto(dot)com> |
Cc: | Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, SATYANARAYANA NARLAPURAM <satyanarlapuram(at)gmail(dot)com> |
Subject: | Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary |
Date: | 2021-12-29 13:57:10 |
Message-ID: | 20211229135710.GH15820@tamriel.snowman.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Greetings,
* Euler Taveira (euler(at)eulerto(dot)com) wrote:
> On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote:
> > pg_archivecleanup currently takes a WAL file name as input to delete
> > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in
> > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to
> > automatically detect the last checkpoint (from control file) LSN,
> > calculate the lowest restart_lsn required by the replication slots, if
> > any (by reading the replication slot info from pg_logical directory),
> > archive the unneeded (an archive_command similar to that of the one
> > provided in the server config can be provided as an input) WAL files
> > before finally deleting them? Making pg_archivecleanup tool as an
> > end-to-end solution will help greatly in disk full situations because
> > of WAL files growth (inactive replication slots, archive command
> > failures, infrequent checkpoint etc.).
The overall idea of having a tool for this isn't a bad idea, but ..
> pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you
> suggesting to use it for removing files from pg_wal directory too? No, thanks.
We definitely shouldn't have it be part of pg_archivecleanup for the
simple reason that it'll be really confusing and almost certainly will
be mis-used. For my 2c, we should just remove pg_archivecleanup
entirely.
> WAL files are a key component for backup and replication. Hence, you cannot
> deliberately allow a tool to remove WAL files from PGDATA. IMO this issue
> wouldn't occur if you have a monitoring system and alerts and someone to keep
> an eye on it. If the disk full situation was caused by a failed archive command
> or a disconnected standby, it is easy to figure out; the fix is simple.
This is perhaps a bit far- PG does, in fact, remove WAL files from
PGDATA. Having a tool which will do this safely when the server isn't
able to be brought online due to lack of disk space would certainly be
helpful rather frequently. I agree that monitoring and alerting are
things that everyone should implement and pay attention to, but that
doesn't happen and instead people end up just blowing away pg_wal and
corrupting their database when, had a tool existed, they could have
avoided that happening and brought the system back online in relatively
short order without any data loss.
Thanks,
Stephen
From | Date | Subject | |
---|---|---|---|
Next Message | Nitin Jadhav | 2021-12-29 13:58:19 | Re: Multi-Column List Partitioning |
Previous Message | Stephen Frost | 2021-12-29 13:46:40 | Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes |