Re: WAL-files is not removing authomaticaly

From: Andrew Anderson <forumwriter007(at)gmail(dot)com>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Cc: "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: WAL-files is not removing authomaticaly
Date: 2021-03-11 07:43:19
Message-ID: CAHAzDikSTvg5y8u-sNbpTNtQ_p3jWjhJBzPDMW1pU5T1x-MKFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> Never, ever, remove WAL files manually.
> A crash might render your database unrecoverable.
You're right, it's not a good idea.

> You'll have to figure out what is blocking WAL removal.
This is my favorite dream!

> Are there archive failures reported in "pg_stat_archiver" and the log?
master=# select * from pg_stat_archiver;
archived_count | last_archived_wal | last_archived_time
| failed_count | last_failed_wal | last_failed_time | stats_reset

----------------+--------------------------+-------------------------------+--------------+-----------------+------------------+------------------------------
3021 | 000000010000000B000000C9 | 2021-03-11 08:53:56.133796+02
| 0 | | | 2021-01-06
14:33:40.70147+02
(1 row)

In today's logs there is no any errors:
$ grep "FATAL\|WARNING" /opt/pglogs/master.log
$

$ grep "FATAL\|WARNING" /opt/pglogs/slave.log
$

> You say that replication is working fine, but try
>
> SELECT pg_current_wal_lsn();
>
> on the primary and compare that with the "restart_lsn" of the replication
slot.

It parameters is differ:
master=# SELECT pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
B/CB099398
(1 row)

master=# select * from pg_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary
| active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn
----------------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
standby_slot | | physical | | | f | t
| 2631806 | | | B/CB0AC068 |
(1 row)

But when I add new data to the table on master, it apears on slave.

> Look for strange messages in the log file on both servers.
Looking for strange messages gives nothing, in today's logs there is no any
strange messages, just about connects of applications.

Regards, Andrew

ср, 10 мар. 2021 г. в 19:40, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>:

> On Wed, 2021-03-10 at 14:34 +0200, Forum Writer wrote:
> > Have Streaming replica with 2 hosts, master and slave with PostgreSQL
> 10.6.
> > Replica works fine and data from master copying to slave without any
> problem,
> > but on slave have a problem: wal-files in catalog ${PGDATA}/pg_wal is not
> > removing authomaticaly and may be collects over some years.
> >
> > Both, master (1.1.1.1) and slave (2.2.2.2), have simmilar replication
> slots:
> >
> > master=# select * from pg_replication_slots;
> > slot_name | plugin | slot_type | datoid | database |
> temporary | active | active_pid | xmin | catalog_xmin | restart_lsn |
> confirmed_flush_lsn
> >
> ---------------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
> > standby_slot | | physical | | | f | t
> | 1184 | | | 8/1C16C508 |
> > (1 row)
> >
> > and seted in ${PGDATA}/recovery.conf on slave:
> >
> > standby_mode = 'on'
> > primary_conninfo = 'user=pgadmin host=2.2.2.2 port=5432 user=pgadmin
> sslmode=prefer sslcompression=1'
> > recovery_target_timeline = 'latest'
> > trigger_file = 'failover'
> > restore_command = 'cp /opt/archivelog/%f %p'
> > archive_cleanup_command = '/usr/pgsql-10/bin/pg_archivecleanup
> /opt/archivelog %r'
> > primary_slot_name = 'standby_slot'
> >
> > In ${PGDATA}/postgresql.conf on both hosts sets:
> >
> > max_wal_size = 8GB
> > wal_keep_segments = 32
> >
> > But yesterday pg_wal was 16G and had 1018 files inside, thats why there
> was no other
> > way except deleting its manualy, but not of all,
> > just 6 hundreds files, which had modification time before September 2020
> except
> > 000000010000000000000034.00000028.backup.
> > Trying to rebuild slave from master with: making pg_start_backup,
> copying ${PGDATA}
> > files from master to slave (without removing on slave any files in
> ${PGDATA}),
> > pg_stop_backup, tunning settings on slave and run it again had no any
> success,
> > wal-files still is not removing automaticaly.
> > Does anybody knows how to fix it ?
>
> Never, ever, remove WAL files manually.
> A crash might render your database unrecoverable.
>
> You'll have to figure out what is blocking WAL removal.
>
> Are there archive failures reported in "pg_stat_archiver" and the log?
>
> You say that replication is working fine, but try
>
> SELECT pg_current_wal_lsn();
>
> on the primary and compare that with the "restart_lsn" of the replication
> slot.
>
> Look for strange messages in the log file on both servers.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sandeep Saxena 2021-03-11 08:50:48 Re: pgAgent for multiple databases in same instance
Previous Message David G. Johnston 2021-03-11 06:39:52 Re: Example 31-2. libpq Example Program 2