Re: replication primary writting infinite number of WAL files

From: Les <nagylzs(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: replication primary writting infinite number of WAL files
Date: 2023-11-24 22:01:46
Message-ID: CAKXe9UBxKm3pTTUVmoOHMM96kGzN=Y3rHoP53dD=FytMzM8bPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>
>
> A single sequence for all id columns across all tables?
>
> How is the sequence value landing in the id column?
>
In most cases it is by using "nextval(seq_name)" in the SQL statement. But
sometimes the sequence value is taken first, and then multiple inserts are
sent with fixed increasing values. (Sometimes records reference each other,
such records are inserted within a short transaction with deferred foreign
key constraints.)

>
>> It might be possible to "select nextval" 1M times per second from an
> Pl/SQL function that does nothing else in an infinite loop. But that would
> not write too much data on the disk.
>
> On an old laptop:
>
> create table insert_test(id int);
>
> insert into insert_test select val from generate_series(1, 1000000) as
> t(val);
> INSERT 0 1000000
> Time: 943.918 ms
>
I would say there is more then just the id being inserted, unless all the
> other fields allow NULL.
>
>
>
>>
>> And there is nothing in the logs in that time period besides "checkpoints
>> are happening too frequently"?
>>
>>
>
> The "set transaction" warning is due to a bug in an application that calls
> SET TRANSACTION before starting a transaction.
>
> And what is that app doing when it does SET TRANSACTION?
>
It is the main app that is using the database, using jdbc. Multiple
instances were running when the sequence jumped.

>
>
>
>> How are the snapshots being taken?
>>
> zfs snapshot is taken on the standby, then zfs clone is created on the
> snapshot, and a new postgresql instance is started on top of the clone. It
> recovers within one minute. In a very few cases (two or three times in a
> year), it fails to recover. Then we use a different snapshot. When the dev
> instance starts up, then we immediately delete everything from repmgr
> nodes, and disable repmgr completely. Today we noticed that the dev db was
> created in a network that made it possible to connect to the prod primary.
> (dev db's network was not separated from the prod db's network, fixed
> today). The dev db might have connected to the prod/primary after startup.
> But that dev instance was created 7 days ago, so probably it is not related.
>
> What I know about ZFS would fit in the navel of flea, so someone else will
> have to comment on this.
>
>
> Dev connected to prod/primary how?
>
When the dev db starts up (from the cloned data directory), it still has
the old repmgr conf. That config is deleted, and repmgr is disabled right
after the startup, but there is the possibility that the dev db has
connected the primary when it was cloned (7 days ago), because at the
beginning of startup, it is the exact clone of the standby from a previous
point of time.

Laszlo

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Adrian Klaver 2023-11-24 23:24:44 Re: replication primary writting infinite number of WAL files
Previous Message Ron Johnson 2023-11-24 21:39:57 Re: pg_getnameinfo_all() failed: Temporary failure in name resolution