Re: could not link file in wal restore lines

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: michael(at)paquier(dot)xyz
Cc: zsolt(dot)ero(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: could not link file in wal restore lines
Date: 2022-07-25 08:11:32
Message-ID: 20220725.171132.2272594383346737093.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

At Sat, 23 Jul 2022 12:36:47 +0900, Michael Paquier <michael(at)paquier(dot)xyz> wrote in
> FWIW, the backend code has protections to prevent *exactly* this kind
> of problems when recycling WAL segment files at checkpoints with a set
> of LWLocks taken on the control file, for one. Perhaps you have
> messed up things and you have finished in such a state that backrest
> writes to pg_wal/ concurrently with a cluster running and running a
> checkpoint, which would explain those link() calls to be failing?

That lock doesn't seem excluding recovery.

I can reproduce with the following script (see below) with some sleep
is added before (or after) durable_link_or_rename call in
InstallXlogFileSegment (attached). Some adjustment might be required
to reproduce the same on other environment.

=====
2022-07-25 17:05:57.730 JST [151758] LOG: restored log file "000000010000000000000057" from archive
2022-07-25 17:05:57.760 JST [151758] LOG: restored log file "000000010000000000000058" from archive
2022-07-25 17:05:57.782 JST [151758] LOG: restored log file "000000010000000000000059" from archive
2022-07-25 17:05:57.790 JST [151762] LOG: could not link file "pg_wal/000000010000000000000002" to "pg_wal/000000010000000000000059": File exists
2022-07-25 17:05:57.802 JST [151758] LOG: restored log file "00000001000000000000005A" from archive
2022-07-25 17:05:58.294 JST [151762] LOG: could not link file "pg_wal/000000010000000000000003" to "pg_wal/00000001000000000000005A": File exists

========
#! /bin/bash

# create a backup-source
PGDATA=~/test/data
PGARC=~/test/arc
BKDIR=~/test/bk
CPDATA=~/test/dt

rm /tmp/hoge
rm -r $PGDATA $PGARC $BKDIR $CPDATA
mkdir $PGARC
killall -9 postgres

initdb -D $PGDATA
echo "archive_mode=on" >> $PGDATA/postgresql.conf
echo "archive_command = 'cp %p $PGARC/%f'" >> $PGDATA/postgresql.conf

#start the source
pg_ctl -D $PGDATA start

# take a backup
pg_basebackup -D $BKDIR
echo "archive_mode=off" >> $BKDIR/postgresql.conf
echo "restore_command='cp $PGARC/%f %p'" >> $BKDIR/postgresql.conf
touch $BKDIR/recovery.signal

# create archived segments
psql -c 'create table t (a int)'
for i in $(seq 1 100); do psql -c 'insert into t values(0); select pg_switch_wal()'; done

#stop the source
pg_ctl -D $PGDATA stop

# start recovery
rm -rf $CPDATA
cp -r $BKDIR $CPDATA
touch /tmp/hoge
postgres -D $CPDATA 2>&1 | tee recovery.log
======

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
repro20220725.diff text/x-patch 577 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Kyotaro Horiguchi 2022-07-25 08:25:52 Re: could not link file in wal restore lines
Previous Message Marco Boeringa 2022-07-25 06:04:52 Re: Fwd: "SELECT COUNT(*) FROM" still causing issues (deadlock) in PostgreSQL 14.3/4?