Quick Links

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly

From:	"Vitaly Davydov" <v(dot)davydov(at)postgrespro(dot)ru>
To:	tomas(at)vondra(dot)me
Cc:	"pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Date:	2024-11-21 14:56:05
Message-ID:	fb689-673f4a00-25-21dfc440@244947041
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi Tomas,

Thank you for the reply and your interest to the investigation.

On Wednesday, November 20, 2024 20:24 MSK, Tomas Vondra <tomas(at)vondra(dot)me> wrote:

> If an existing physical slot is advanced in the middle of checkpoint
> execution, WAL segments, which are related to saved on disk restart LSN
> may be removed. It is because the calculation of the replication slot
> miminal LSN is occured at the end of checkpoint, prior to old WAL
> segments removal. If to hard stop (pg_stl -m immediate) the postgres
> instance right after checkpoint and to restart it, the slot's
> restart_lsn may point to the removed WAL segment. I believe, such
> behaviour is not good.
>

Not sure I 100% follow, but let me rephrase, just so that we're on the
same page. CreateCheckPoint() does this:

... something ...

CheckPointGuts(checkPoint.redo, flags);

... something ...

RemoveOldXlogFiles(_logSegNo, RedoRecPtr, recptr,
checkPoint.ThisTimeLineID);

The slots get synced in CheckPointGuts(), so IIUC you're saying if the
slot gets advanced shortly after the sync, the RemoveOldXlogFiles() may
remove still-needed WAL, because we happen to consider a fresh
restart_lsn when calculating the logSegNo. Is that right?
They key action here is to restart the instance with -m immediate (or kill it and start it again) right after checkpoint. After restart, the slot's restart_lsn will be read from the disk and address to the removed WAL segment, if it's LSN was davanced enought to switch to a new WAL segment.

I tried to reproduce the issue using a stress test (checkpoint+restart
in a loop), but so far without success :-(

Can you clarify where exactly you added the pg_usleep(), and how long
are the waits you added? I wonder if the sleep is really needed,
considering the checkpoints are spread anyway. Also, what you mean by
"hard reset"?

I added pg_usleep as show below (in CheckPointBuffers function):

CheckPointBuffers(int flags)
{
BufferSync(flags);
+ pg_usleep(10000000);
}

Below is the instruction how I run my test (pg_usleep should be added to the code):

CONSOLE> initdb -D pgdata
CONSOLE> pg_ctl -D pgdata -l logfile start
... open two psql terminals and connect to the database (lets call them as PSQL-1, PSQL-2)
PSQL-1> select pg_create_physical_replication_slot('myslot', true, false);
CONSOLE> pgbench -i -s 10 postgres # create some WAL records
PSQL-1> checkpoint; -- press ENTER key and go to PSQL-2 console and execute next line in 1 second
PSQL-2> select pg_replication_slot_advance('myslot', pg_current_wal_lsn()); -- advance repslot during checkpoint
... wait for checkpoint to complete
CONSOLE> pg_ctl -D pgdata -m immediate stop
CONSOLE> pg_ctl -D pgdata start
PSQL-1> \c
PSQL-1> create extension pg_walinspect;
PSQL-1> select pg_get_wal_record_info(restart_lsn) from pg_replication_slots where slot_name = 'myslot';
ERROR: requested WAL segment pg_wal/000000010000000000000001 has already been removed

I'm trying to create a perl test to reproduce it. Please, give me some time to create the test script.

I kept thinking about this (sorry it's this incremental), particularly
if this applies to logical replication too. And AFAICS it does not, or
at least not to this extent.

Yes, it is not applied to logical replication, because logical slot is synced when advancing.

eah, a simple way to fix this might be to make sure we don't use the
required LSN value set after CheckPointReplicationSlots() to remove WAL.
AFAICS the problem is KeepLogSeg() gets the new LSN value by:

keep = XLogGetReplicationSlotMinimumLSN();

Let's say we get the LSN before calling CheckPointGuts(), and then pass
it to KeepLogSeg, so that it doesn't need to get the fresh value.

Yes, it is another solution and it can fix the problem. The question - which solution to choose. Well, I prefer to add a new in-memory state variable in the slot structure. Such variable may be useful if we want to check whether the slot data is synced or not. The calculation of the keep value before CheckPointGuts(), IMHO, requires to change signatures of a number of functions. I may prepare a new patch where your solution is implemented.

I'm sorry, if I missed to answer to some other questions. I will answer later.

With best regards,
Vitaly

In response to

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly at 2024-11-20 17:24:40 from Tomas Vondra

Responses

Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly at 2024-11-21 16:13:23 from Давыдов Виталий

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	yuansong	2024-11-21 15:02:46	backup server core when redo btree_xlog_insert that type is XLOG_BTREE_INSERT_POST
Previous Message	Dave Page	2024-11-21 14:53:04	Re: Windows 2016 server crashed after changes in Postgres 15.8 pgAdmin