From: | Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: FSM Corruption (was: Could not read block at end of the relation) |
Date: | 2024-04-11 07:36:50 |
Message-ID: | 5959995.31r3eYUQgx@aivenlaptop |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Le dimanche 7 avril 2024, 00:30:37 CEST Noah Misch a écrit :
> Your v3 has the right functionality. As further confirmation of the fix, I
> tried reverting the non-test parts of commit 917dc7d "Fix WAL-logging of FSM
> and VM truncation". That commit's 008_fsm_truncation.pl fails with 917dc7d
> reverted from master, and adding this patch makes it pass again. I ran
> pgindent and edited comments. I think the attached version is ready to go.
>
Thank you Noah, the updated comments are much better. I think it should be
backported at least to 16 since the chances of tripping on that behaviour are
quite high here, but what about previous versions ?
> While updating comments in FreeSpaceMapPrepareTruncateRel(), I entered a
> rabbit hole about the comments 917dc7d left about torn pages. I'm sharing
> these findings just in case it helps a reader of the $SUBJECT patch avoid
> the same rabbit hole. Both fsm and vm read with RBM_ZERO_ON_ERROR, so I
> think they're fine with torn pages. Per the README sentences I'm adding,
> FSM could stop writing WAL. I'm not proposing that, but I do bet it's the
> right thing. visibilitymap_prepare_truncate() has mirrored fsm truncate
> since 917dc7d. The case for removing WAL there is clearer still, because
> parallel function visibilitymap_clear() does not write WAL. I'm attaching
> a WIP patch to remove visibilitymap_prepare_truncate() WAL. I'll abandon
> that or pursue it for v18, in a different thread.
That's an interesting finding.
> If I were continuing the benchmark study, I would try SSD, a newer kernel,
> and/or shared_buffers=48GB. Instead, since your perf results show only
> +0.01% CPU from new lseek() calls, I'm going to stop there and say it's
> worth taking the remaining risk that some realistic scenario gets a
> material regression from those new lseek() calls.
Agree with you here.
Many thanks,
--
Ronan Dunklau
From | Date | Subject | |
---|---|---|---|
Next Message | Richard Guo | 2024-04-11 07:41:42 | Re: BUG #18422: Assert in expandTupleDesc() fails on row mismatch with additional SRF |
Previous Message | PG Bug reporting form | 2024-04-11 07:31:12 | BUG #18427: RPM postgis33_15-3.3.6-3PGDG.rhel9.x86_64.rpm not signed |