Re: replication primary writting infinite number of WAL files

From: Les <nagylzs(at)gmail(dot)com>
To: Andreas Kretschmer <andreas(at)a-kretschmer(dot)de>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: replication primary writting infinite number of WAL files
Date: 2023-11-24 13:12:47
Message-ID: CAKXe9UAMP+gxUr9afGJxzdyWOC6oHE1q=mnpREnSBjtAyB6cfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
> > First I was also thinking about vacuum. But removing a replication
> > slot should have no effect on vacuum on the primary (AFAIK). Please
> > correct me if I'm wrong.
> >
>
> yeah, depends. there are 2 processes:
>
> * 1 process generating the wal's, maybe a VACUUM
> * an inactive slot holding the wals
>
> For instance, if a standby not reachable the wal's will accumulated
> within the slot, till the standby is reachable again.
>

I understand that an unreachable standby can cause WAL files accumulated in
the pg_wal directory. This has happened before, and it is expected. What I
don't get is the amount and the speed. Write speed went up from the normal
5MB/sec to 1500MB/sec within a minute. When the slot was removed, it went
down to normal again. We could have easily solved the problem of a
disconnected standby, because free disk space is monitored. But in this
case, there was not enough time to react. PostgreSQL filled up the
remaining 40% free disk space in a matter of minutes. By the time we got
the alert message and logged into the server, it was already too late, the
disk was full.

There is a strong correlation between the speed/amount of data written, and
the existence of that replication slot. If we drop the slot, then write
speed goes down immediately. If we add that slot again, then after some
time the problem comes back. (All I can say is that it happened three
times.) Interestingly, it does not happen with the other standby - that one
is still connected, and works flawlessly. I don't know of any normal
PostgreSQL mechanism that could cause this behaviour. We already ruled out
client applications, because all client apps were shut down, volume size
increased and then PostgreSQL restarted, but did not solve the problem.

Laszlo

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Ron Johnson 2023-11-24 14:28:47 Re: Odd Shortcut behaviour in PG14
Previous Message Zahir Lalani 2023-11-24 13:00:59 RE: Odd Shortcut behaviour in PG14