From: | Stepan Neretin <slpmcf(at)gmail(dot)com> |
---|---|
To: | "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: FSM doesn't recover after zeroing damaged page. |
Date: | 2025-03-10 07:58:24 |
Message-ID: | CA+Yyo5R-5A3+dWsS651vEKuGeTW2i5OMMYCfpS2=W8q4s_spng@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Feb 7, 2025 at 7:15 AM Anton A. Melnikov <a(dot)melnikov(at)postgrespro(dot)ru>
wrote:
> Hi!
>
> At the current master i found that if not the last page of
> the FSM bottom layer was corrupted it is not restored after zeroing.
>
> Here is reproduction like that:
> 1) Create a table with FSM of 4 pages:
> create table t (int) as select * from generate_series(1, 1E6);
> delete from t where ctid in (select ctid from t tablesample bernoulli
> (20));
> SELECT pg_relation_filepath('t'); -- to know the filename with FSM
> vacuum t;
>
> 2) Do checkpoint and stop the server.
>
> 3) Corrupt a byte in the third page. For instance, the lower byte of the
> CRC:
> printf '\xAA' | dd of=/usr/local/pg12252-vanm/data/base/5/<filename_fsm>
> bs=1 seek=$((2*8192+8)) count=1 conv=notrunc
>
> 4) start server and execute: vacuum t; twice: to ensure that corrupted page
> is fixed in memory, zeroed and a new header was written on it.
>
> postgres=# vacuum t;
> WARNING: page verification failed, calculated checksum 13869 but expected
> 13994
> WARNING: invalid page in block 2 of relation base/5/16384_fsm; zeroing
> out page
> VACUUM
> postgres=# vacuum t; -- without warnings
> VACUUM
>
> 5) Do checkpoint and restart the server. After vacuum t; the warnings
> appeared again:
> postgres=# vacuum t;
> WARNING: page verification failed, calculated checksum 13869 but expected
> 13994
> WARNING: invalid page in block 2 of relation base/5/16384_fsm; zeroing
> out page
> VACUUM
>
> I noticed that the updated page is not written to disk because the
> buffer where it is located is not marked dirty. Moreover
> MarkBufferDirtyHint(),
> which is called for modified FSM pages, seems is not suitable here,
> since as i suppose the corrupted page must be rewritten certainly, not for
> hint.
> Therefore, maybe mark it dirty immediately after writing the new header?
> Here is a small patch that does it and eliminates multiple warnings.
> Would be glad if you take a look on it.
>
> With the best regards,
>
> --
> Anton A. Melnikov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
Hi, I reproduce the problem step by step. Patch fixes this problem. Looks
good to me.
Best Regards, Stepan Neretin.
From | Date | Subject | |
---|---|---|---|
Next Message | Anthonin Bonnefoy | 2025-03-10 08:07:59 | Re: Memory context can be its own parent and child in replication command |
Previous Message | jian he | 2025-03-10 07:54:45 | Re: Non-text mode for pg_dumpall |