Re: replication primary writting infinite number of WAL files

From: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
To: "pgsql-generallists(dot)postgresql(dot)org" <pgsql-general(at)lists(dot)postgresql(dot)org>
Subject: Re: replication primary writting infinite number of WAL files
Date: 2023-11-24 16:11:56
Message-ID: CANzqJaAnsfBazSEZrH8_Wib_CCoK-Zu3mKYC3qwdvRqdjmOLwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Nov 24, 2023 at 11:00 AM Les <nagylzs(at)gmail(dot)com> wrote:
[snip]

> Writing of WAL files continued after we shut down all clients, and
> restarted the primary PostgreSQL server.
>
> The order was:
>
> 1. shut down all clients
> 2. stop the primary
> 3. start the primary
> 4. primary started to write like mad again
> 5. removed replication slot
> 6. primary stopped madness and deleted all WAL files (except for a few)
>
> How can the primary server generate more and more WAL files (writes) after
> all clients have been shut down and the server was restarted? My only bet
> was the autovacuum. But I ruled that out, because removing a replication
> slot has no effect on the autovacuum (am I wrong?). Now you are saying that
> this looks like a huge rollback. Does rolling back changes require even
> more data to be written to the WAL after server restart? As far as I know,
> if something was not written to the WAL, then it is not something that can
> be rolled back. Does removing a replication slot lessen the amount of data
> needed to be written for a rollback (or for anything else)? It is a fact
> that the primary stopped writing at 1.5GB/sec the moment we removed the
> slot.
>
> I'm not saying that you are wrong. Maybe there was a
> crazy application. I'm just saying that a crazy application cannot be the
> whole picture. It cannot explain this behaviour as a whole. Or maybe I have
> a deep misunderstanding about how WAL files work. On the second occasion,
> the primary was running for a few minutes when pg_wal started to increase.
> We noticed that early, and shut down all clients, then restarted the
> primary server. After the restart, the primary was writing out more WAL
> files for many more minutes, until we dropped the slot again. E.g. it was
> writing much more data after the restart than before the restart; and it
> only stopped (exactly) when we removed the slot.
>

pg_stat_activity will tell you something about what's happening even after
you think "all clients have been shut down".

I'd crank up the logging.to at least:
log_error_verbosity = verbose
log_statement = all
track_activity_query_size = 10240
client_min_messages = notice
log_line_prefix = '%m\t%r\t%u\t%d\t%p\t%i\t%a\t%e\t'

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Sándor Daku 2023-11-24 16:26:15 Re: replication primary writting infinite number of WAL files
Previous Message Les 2023-11-24 15:59:55 Re: replication primary writting infinite number of WAL files