FSM doesn't recover after zeroing damaged page.

From: "Anton A(dot) Melnikov" <a(dot)melnikov(at)postgrespro(dot)ru>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: FSM doesn't recover after zeroing damaged page.
Date: 2025-02-07 00:15:17
Message-ID: a61efc0b-9cfc-4f24-ac5d-ea6600d9ccbf@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

At the current master i found that if not the last page of
the FSM bottom layer was corrupted it is not restored after zeroing.

Here is reproduction like that:
1) Create a table with FSM of 4 pages:
create table t (int) as select * from generate_series(1, 1E6);
delete from t where ctid in (select ctid from t tablesample bernoulli (20));
SELECT pg_relation_filepath('t'); -- to know the filename with FSM
vacuum t;

2) Do checkpoint and stop the server.

3) Corrupt a byte in the third page. For instance, the lower byte of the CRC:
printf '\xAA' | dd of=/usr/local/pg12252-vanm/data/base/5/<filename_fsm> bs=1 seek=$((2*8192+8)) count=1 conv=notrunc

4) start server and execute: vacuum t; twice: to ensure that corrupted page
is fixed in memory, zeroed and a new header was written on it.

postgres=# vacuum t;
WARNING: page verification failed, calculated checksum 13869 but expected 13994
WARNING: invalid page in block 2 of relation base/5/16384_fsm; zeroing out page
VACUUM
postgres=# vacuum t; -- without warnings
VACUUM

5) Do checkpoint and restart the server. After vacuum t; the warnings appeared again:
postgres=# vacuum t;
WARNING: page verification failed, calculated checksum 13869 but expected 13994
WARNING: invalid page in block 2 of relation base/5/16384_fsm; zeroing out page
VACUUM

I noticed that the updated page is not written to disk because the
buffer where it is located is not marked dirty. Moreover MarkBufferDirtyHint(),
which is called for modified FSM pages, seems is not suitable here,
since as i suppose the corrupted page must be rewritten certainly, not for hint.
Therefore, maybe mark it dirty immediately after writing the new header?
Here is a small patch that does it and eliminates multiple warnings.
Would be glad if you take a look on it.

With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment Content-Type Size
0001-Fix-recovering-damaged-FSM-pages.patch text/x-patch 906 bytes

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-02-07 00:47:53 Re: Show WAL write and fsync stats in pg_stat_io
Previous Message Tom Lane 2025-02-07 00:15:09 Re: Should we allow ALTER OPERATOR CLASS to ADD/DROP operators and procedures?