From: | Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Checkpointer crashes with "PANIC: could not fsync file "pg_tblspc/.." |
Date: | 2021-12-21 11:17:23 |
Message-ID: | CAFiTN-szX=ayO80EnSWonBu1YMZrpOr9V0R3BzHBSjMrMPAeMg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
While testing the below case with the hot standby setup (with the
latest code), I have noticed that the checkpointer process crashed
with the $subject error. As per my observation, we have registered the
SYNC_REQUEST when inserting some tuple into the table, and later on
ALTER SET TABLESPACE we have registered the SYNC_UNLINK_REQUEST, which
looks fine so far, then I have noticed that only when the standby is
connected the underlying table file w.r.t the old tablespace is
already deleted. Now, in AbsorbFsyncRequests we don't do anything for
the SYNC_REQUEST even though we have SYNC_UNLINK_REQUEST for the same
file, but since the underlying file is already deleted the
checkpointer cashed while processing the SYNC_REQUEST.
I have spent some time on this but could not figure out how the
relfilenodenode file w.r.t. to the old tablespace is getting deleted
and if I disconnect the standby then it is not getting deleted, not
sure how walsender is playing a role in deleting the file even before
checkpointer process the unlink request.
postgres[8905]=# create tablespace tab location
'/home/dilipkumar/work/PG/install/bin/test';
CREATE TABLESPACE
postgres[8905]=# create tablespace tab1 location
'/home/dilipkumar/work/PG/install/bin/test1';
CREATE TABLESPACE
postgres[8905]=# create database test tablespace tab;
CREATE DATABASE
postgres[8905]=# \c test
You are now connected to database "test" as user "dilipkumar".
test[8912]=# create table t( a int PRIMARY KEY,b text);
CREATE TABLE
test[8912]=# insert into t values (generate_series(1,10), 'aaa');
INSERT 0 10
test[8912]=# alter table t set tablespace tab1 ;
ALTER TABLE
test[8912]=# CHECKPOINT ;
WARNING: 57P02: terminating connection because of crash of another
server process
log shows:
PANIC: could not fsync file
"pg_tblspc/16384/PG_15_202112131/16386/16387": No such file or
directory
backtrace:
#0 0x00007f2f865ff387 in raise () from /lib64/libc.so.6
#1 0x00007f2f86600a78 in abort () from /lib64/libc.so.6
#2 0x0000000000b13da3 in errfinish (filename=0xcf283f "sync.c", ..
#3 0x0000000000978dc7 in ProcessSyncRequests () at sync.c:439
#4 0x00000000005949d2 in CheckPointGuts (checkPointRedo=67653624,
flags=108) at xlog.c:9590
#5 0x00000000005942fe in CreateCheckPoint (flags=108) at xlog.c:9318
#6 0x00000000008a80b7 in CheckpointerMain () at checkpointer.c:444
Note: This smaller test case is derived from one of the bigger
scenarios raised by Neha Sharma [1]
--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Dilip Kumar | 2021-12-21 12:04:09 | Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints |
Previous Message | Kyotaro Horiguchi | 2021-12-21 11:04:55 | Re: In-placre persistance change of a relation |