Re: replication primary writting infinite number of WAL files

From: Sándor Daku <daku(dot)sandor(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: replication primary writting infinite number of WAL files
Date: 2023-11-24 16:26:15
Message-ID: CAKyoTgb80GwiGJdR92rtAh7DhU40r7SY3KeNep3nzrsjeBHp3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 24 Nov 2023, 17:12 Ron Johnson, <ronljohnsonjr(at)gmail(dot)com> wrote:

> On Fri, Nov 24, 2023 at 11:00 AM Les <nagylzs(at)gmail(dot)com> wrote:
> [snip]
>
>> Writing of WAL files continued after we shut down all clients, and
>> restarted the primary PostgreSQL server.
>>
>> The order was:
>>
>> 1. shut down all clients
>> 2. stop the primary
>> 3. start the primary
>> 4. primary started to write like mad again
>> 5. removed replication slot
>> 6. primary stopped madness and deleted all WAL files (except for a few)
>>
>> How can the primary server generate more and more WAL files (writes)
>> after all clients have been shut down and the server was restarted? My only
>> bet was the autovacuum. But I ruled that out, because removing a
>> replication slot has no effect on the autovacuum (am I wrong?). Now you are
>> saying that this looks like a huge rollback. Does rolling back changes
>> require even more data to be written to the WAL after server restart? As
>> far as I know, if something was not written to the WAL, then it is not
>> something that can be rolled back. Does removing a replication slot lessen
>> the amount of data needed to be written for a rollback (or for anything
>> else)? It is a fact that the primary stopped writing at 1.5GB/sec the
>> moment we removed the slot.
>>
>> I'm not saying that you are wrong. Maybe there was a
>> crazy application. I'm just saying that a crazy application cannot be the
>> whole picture. It cannot explain this behaviour as a whole. Or maybe I have
>> a deep misunderstanding about how WAL files work. On the second occasion,
>> the primary was running for a few minutes when pg_wal started to increase.
>> We noticed that early, and shut down all clients, then restarted the
>> primary server. After the restart, the primary was writing out more WAL
>> files for many more minutes, until we dropped the slot again. E.g. it was
>> writing much more data after the restart than before the restart; and it
>> only stopped (exactly) when we removed the slot.
>>
>
> pg_stat_activity will tell you something about what's happening even after
> you think "all clients have been shut down".
>
> I'd crank up the logging.to at least:
> log_error_verbosity = verbose
> log_statement = all
> track_activity_query_size = 10240
> client_min_messages = notice
> log_line_prefix = '%m\t%r\t%u\t%d\t%p\t%i\t%a\t%e\t'
>

I dont know if it makes any sense, but is there a relatively painless way
to look into the produced wal files to see what are they filled with? It
might give some pointers to the source of the issue.

Regards,
Sándor

>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Zahir Lalani 2023-11-24 16:46:42 RE: Odd Shortcut behaviour in PG14
Previous Message Ron Johnson 2023-11-24 16:11:56 Re: replication primary writting infinite number of WAL files