From: | Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: FSM Corruption (was: Could not read block at end of the relation) |
Date: | 2024-04-15 05:33:49 |
Message-ID: | 12418161.O9o76ZdvQC@aivenlaptop |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Le samedi 13 avril 2024, 19:15:28 CEST Noah Misch a écrit :
> On Thu, Apr 11, 2024 at 08:38:43AM -0700, Noah Misch wrote:
> > On Thu, Apr 11, 2024 at 09:36:50AM +0200, Ronan Dunklau wrote:
> > > Le dimanche 7 avril 2024, 00:30:37 CEST Noah Misch a écrit :
> > > > Your v3 has the right functionality. As further confirmation of the
> > > > fix, I
> > > > tried reverting the non-test parts of commit 917dc7d "Fix WAL-logging
> > > > of FSM and VM truncation". That commit's 008_fsm_truncation.pl fails
> > > > with 917dc7d reverted from master, and adding this patch makes it
> > > > pass again. I ran pgindent and edited comments. I think the
> > > > attached version is ready to go.> >
> > > Thank you Noah, the updated comments are much better. I think it should
> > > be
> > > backported at least to 16 since the chances of tripping on that
> > > behaviour are quite high here, but what about previous versions ?
> >
> > It should be reachable in all branches, just needing concurrent extension
> > lock waiters to reach before v16. Hence, my plan is to back-patch it all
> > the way. It applies with negligible conflicts back to v12.
>
> While it applied, it doesn't build in v12 or v13, due to smgr_cached_nblocks
> first appearing in c5315f4. Options:
>
> 1. Back-patch the addition of smgr_cached_nblocks or equivalent.
> 2. Stop the back-patch of $SUBJECT at v14.
> 3. Incur more lseek() in v13 and v12.
>
> Given the lack of reports before v16, (3) seems too likely to be a cure
> worse than the disease. I'm picking (2) for today. We could do (1)
> tomorrow, but I lean toward (2) until someone reports the problem on v13 or
> v12. The problem's impact is limited to DML giving ERROR when it should
> have succeeded, and I expect VACUUM FULL is a workaround. Without those
> mitigating factors, I would choose (1).
>
> Pushed that way, as 9358297.
I agree with you that option 2 seems to be the safest course of action. For
the record, other available options when that happens is to stop PG and
manually remove the FSM from disk (see https://wiki.postgresql.org/wiki/
Free_Space_Map_Problems) or adapt the patch submitted here https://
www.postgresql.org/message-id/flat/5446938.Sb9uPGUboI%40aivenlaptop to do it
online.
Thank you for all your work with refining, testing and finally comitting the fix.
Best regards,
--
Ronan Dunklau
From | Date | Subject | |
---|---|---|---|
Next Message | MUSTAFA TANSEL TEKİN | 2024-04-15 06:45:05 | BUG #18425: KB5036892 Microsoft Update Package Issue |
Previous Message | David Rowley | 2024-04-15 03:42:10 | Re: Re: BUG #18305: Unexpected error: "WindowFunc not found in subplan target lists" triggered by subqueries |