From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | Jakub(dot)Wartak(at)tomtom(dot)com |
Cc: | tsunakawa(dot)takay(at)fujitsu(dot)com, osumi(dot)takamichi(at)fujitsu(dot)com, sfrost(at)snowman(dot)net, masao(dot)fujii(at)oss(dot)nttdata(dot)com, ashutosh(dot)bapat(dot)oss(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: In-placre persistance change of a relation |
Date: | 2021-12-22 06:13:27 |
Message-ID: | 20211222.151327.439673660364783186.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello, Jakub.
At Tue, 21 Dec 2021 13:07:28 +0000, Jakub Wartak <Jakub(dot)Wartak(at)tomtom(dot)com> wrote in
> So what's suspicious is that 122880 -> 0 file size truncation. I've investigated WAL and it seems to contain TRUNCATE records
> after logged FPI images, so when the crash recovery would kick in it probably clears this table (while it shouldn't).
Darn.. It is too silly that I wrongly issued truncate records for the
target relation of the function (rel) instaed of the relation on which
we're currently operating at that time (r).
> However if I perform CHECKPOINT just before crash the WAL stream contains just RUNNING_XACTS and CHECKPOINT_ONLINE
> redo records, this probably prevents truncating. I'm newbie here so please take this theory with grain of salt, it can be
> something completely different.
It is because the WAL records are inconsistent with the on-disk state.
After a crash before a checkpoint after the SET LOGGED, recovery ends with
recoverying the broken WAL records, but after that the on-disk state
is persisted and the broken WAL records are not replayed.
The following fix works.
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5478,7 +5478,7 @@ RelationChangePersistence(AlteredTableInfo *tab, char persistence,
xl_smgr_truncate xlrec;
xlrec.blkno = 0;
- xlrec.rnode = rel->rd_node;
+ xlrec.rnode = r->rd_node;
xlrec.flags = SMGR_TRUNCATE_ALL;
I made another change in this version. Previously only btree among all
index AMs was processed in the in-place manner. In this version we do
that all AMs except GiST. Maybe if gistGetFakeLSN behaved the same
way for permanent and unlogged indexes, we could skip index rebuild in
exchange of some extra WAL records emitted while it is unlogged.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
v11-0001-In-place-table-persistence-change.patch | text/x-patch | 75.3 KB |
v11-0002-New-command-ALTER-TABLE-ALL-IN-TABLESPACE-SET-LO.patch | text/x-patch | 11.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Smith | 2021-12-22 07:31:37 | Re: row filtering for logical replication |
Previous Message | Amit Kapila | 2021-12-22 05:54:52 | Re: row filtering for logical replication |