From: | Paul Guo <pguo(at)pivotal(dot)io> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | standby recovery fails (tablespace related) (tentative patch and discussion) |
Date: | 2019-04-17 07:56:30 |
Message-ID: | CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello postgres hackers,
Recently my colleagues and I encountered an issue: a standby can
not recover after an unclean shutdown and it's related to tablespace.
The issue is that the standby re-replay some xlog that needs tablespace
directories (e.g. create a database with tablespace),
but the tablespace directories has already been removed in the
previous replay.
In details, the standby normally finishes replaying for the below
operations, but due to unclean shutdown, the redo lsn
is not updated in pg_control and is still kept a value before the 'create
db with tabspace' xlog, however since the tablespace
directories were removed so it reports error when repay the database create
wal.
create db with tablespace
drop database
drop tablespace.
Here is the log on the standby.
2019-04-17 14:52:14.926 CST [23029] LOG: starting PostgreSQL 12devel on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat
4.8.5-4), 64-bit
2019-04-17 14:52:14.927 CST [23029] LOG: listening on IPv4 address
"192.168.35.130", port 5432
2019-04-17 14:52:14.929 CST [23029] LOG: listening on Unix socket
"/tmp/.s.PGSQL.5432"
2019-04-17 14:52:14.943 CST [23030] LOG: database system was interrupted
while in recovery at log time 2019-04-17 14:48:27 CST
2019-04-17 14:52:14.943 CST [23030] HINT: If this has occurred more than
once some data might be corrupted and you might need to choose an earlier
recovery target.
2019-04-17 14:52:14.949 CST [23030] LOG: entering standby mode
2019-04-17 14:52:14.950 CST [23030] LOG: redo starts at 0/30105B8
2019-04-17 14:52:14.951 CST [23030] FATAL: could not create directory
"pg_tblspc/65546/PG_12_201904072/65547": No such file or directory
2019-04-17 14:52:14.951 CST [23030] CONTEXT: WAL redo at 0/3011650 for
Database/CREATE: copy dir 1663/1 to 65546/65547
2019-04-17 14:52:14.951 CST [23029] LOG: startup process (PID 23030)
exited with exit code 1
2019-04-17 14:52:14.951 CST [23029] LOG: terminating any other active
server processes
2019-04-17 14:52:14.953 CST [23029] LOG: database system is shut down
Steps to reprodce:
1. setup a master and standby.
2. On both side, run: mkdir /tmp/some_isolation2_pg_basebackup_tablespace
3. Run SQLs:
drop tablespace if exists some_isolation2_pg_basebackup_tablespace;
create tablespace some_isolation2_pg_basebackup_tablespace location
'/tmp/some_isolation2_pg_basebackup_tablespace';
3. Clean shutdown and restart both postgres instances.
4. Run the following SQLs:
drop database if exists some_database_with_tablespace;
create database some_database_with_tablespace tablespace
some_isolation2_pg_basebackup_tablespace;
drop database some_database_with_tablespace;
drop tablespace some_isolation2_pg_basebackup_tablespace;
\! pkill -9 postgres; ssh host70 pkill -9 postgres
Note immediate shutdown via pg_ctl should also be able to reproduce and the
above steps probably does not 100% reproduce.
I created an initial patch for this issue (see the attachment). The idea is
re-creating those directories recursively. The above issue exists
in dbase_redo(),
but TablespaceCreateDbspace (for relation file create redo) is probably
buggy also so I modified that function also. Even there is no bug
in that function, it seems that using simple pg_mkdir_p() is cleaner. Note
reading TablespaceCreateDbspace(), I found it seems that this issue
has already be thought though insufficient but frankly this solution
(directory recreation) seems to be not perfect given actually this should
have been the responsibility of tablespace creation (also tablespace
creation does more like symlink creation, etc). Also, I'm not sure whether
we need to use invalid page mechanism (see xlogutils.c).
Another solution is that, actually, we create a checkpoint when
createdb/movedb/dropdb/droptablespace, maybe we should enforce to create
restartpoint on standby for such special kind of checkpoint wal - that
means we need to set a flag in checkpoing wal and let checkpoint redo
code to create restartpoint if that flag is set. This solution seems to be
safer.
Thanks,
Paul
Attachment | Content-Type | Size |
---|---|---|
0001-Recursively-create-tablespace-directories-if-those-a.patch | application/octet-stream | 3.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Zhang, Jie | 2019-04-17 08:05:32 | [patch] pg_test_timing does not prompt illegal option |
Previous Message | Jiří Fejfar | 2019-04-17 06:57:19 | Re: extensions are hitting the ceiling |