Re: Change log level for notifying hot standby is waiting non-overflowed snapshot

From: torikoshia <torikoshia(at)oss(dot)nttdata(dot)com>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
Date: 2025-03-03 15:20:11
Message-ID: 61e1a4160566f07ec7b792013cce1195@oss.nttdata.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2025-03-03 13:10, Fujii Masao wrote:

Thanks for your comments!

> On 2025/02/03 22:35, torikoshia wrote:
>> Hi,
>>
>> When a hot standby is restarted in a state where subtransactions have
>> overflowed, it may become inaccessible:
>>
>>   $ psql: error: connection to server at "localhost" (::1), port 5433
>> failed: FATAL:  the database system is not yet accepting connections
>>           DETAIL:  Consistent recovery state has not been yet
>> reached.
>
> Could you share the steps to reproduce this situation?

We can reproduce this situation using the following procedure.
I performed this test with one asynchronous standby server.

-- overflow subtransaction
(primary)=# create table t1 (i int);
(primary)=# select 'insert into t1 values (1); savepoint s_' ||
generate_series(1, 70) ; \gexec
(primary)=# checkpoint;

-- restart standby
$ pg_ctl restart -D data_stb/
waiting for server to shut down.... done
server stopped
waiting for server to start.... LOG: redirecting log output to logging
collector process
........................................................... stopped
waiting
pg_ctl: server did not start in time

-- standby log
DEBUG: recovery snapshot waiting for non-overflowed snapshot or until
oldest active xid on standby is at least 887 (now 818)

>> However, the log message that indicates the cause of this issue seems
>> to be only output at the DEBUG1 level:
>>
>>   elog(DEBUG1,
>>        "recovery snapshot waiting for non-overflowed snapshot or "
>>        "until oldest active xid on standby is at least %u (now %u)",
>>        standbySnapshotPendingXmin,
>>        running->oldestRunningXid);
>>
>> I believe this message would be useful not only for developers but
>> also for users.
>
> Isn't this log message too difficult for most users? It seems to
> describe PostgreSQL's internal mechanisms, making it hard
> for users to understand the issue and what actions to take.

Agreed and I feel that a message suggesting something like "check if
there are any overflowing transactions on the primary side" would make
it useful.
On the other hand, the manual's explanation of
pg_stat_get_backend_subxact() does not mention subtransaction overflow,
so I am not sure how much detail should be included.

--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-03-03 15:23:12 Re: Adding support for SSLKEYLOGFILE in the frontend
Previous Message Vitaly Davydov 2025-03-03 15:12:12 Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly