Re: FSM Corruption (was: Could not read block at end of the relation)

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io>, Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: FSM Corruption (was: Could not read block at end of the relation)
Date: 2024-04-12 03:07:49
Message-ID: CA+hUKGLGtG8VoTDipK_YvRhgP=qGHEFaSOPWctN-rQ5ftWQQMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Apr 12, 2024 at 4:01 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> Although it's not related to the problem you're working on, it seems
> like a good opportunity to bring up a concern about the FSM that I
> don't believe was discussed at any point in the past few years: I
> wonder if the way that fsm_search_avail() sometimes updates
> fsmpage->fp_next_slot with only a shared lock on the page could cause
> problems. At the very least, it's weird that we allow it.

Aha. Good to know. So that is another place where direct I/O on a
file system with checksums might get very upset, if it takes no
measures of its own to prevent the data from changing underneath it
during a pwrite() call. The only known system like that so far is
btrfs (phenemon #1 in [1], see reproducer). The symptom is that the
next read fails with EIO.

[1] https://www.postgresql.org/message-id/CA%2BhUKGKSBaz78Fw3WTF3Q8ArqKCz1GgsTfRFiDPbu-j9OFz-jw%40mail.gmail.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Richard Guo 2024-04-12 03:25:49 Re: BUG #18422: Assert in expandTupleDesc() fails on row mismatch with additional SRF
Previous Message Noah Misch 2024-04-12 02:27:08 Re: FSM Corruption (was: Could not read block at end of the relation)