From: | Ronan Dunklau <ronan(dot)dunklau(at)aiven(dot)io> |
---|---|
To: | Noah Misch <noah(at)leadboat(dot)com> |
Cc: | Michael Paquier <michael(at)paquier(dot)xyz>, pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: FSM Corruption (was: Could not read block at end of the relation) |
Date: | 2024-03-15 06:23:46 |
Message-ID: | 5784272.DvuYhMxLoT@aivenlaptop |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
Le mercredi 13 mars 2024, 17:55:23 CET Noah Misch a écrit :
> On Wed, Mar 13, 2024 at 10:37:21AM +0100, Ronan Dunklau wrote:
> > Le mardi 12 mars 2024, 23:08:07 CET Noah Misch a écrit :
> > > > I would argue this is fine, as corruptions don't happen often, and FSM
> > > > is
> > > > not an exact thing anyway. If we record a block as full, it will just
> > > > be
> > > > unused until the next vacuum adds it back to the FSM with a correct
> > > > value.
> > >
> > > If this happened only under FSM corruption, it would be fine. I was
> > > thinking about normal extension, with an event sequence like this:
> > >
> > > - RelationExtensionLockWaiterCount() returns 10.
> > > - Backend B1 extends by 10 pages.
> > > - B1 records 9 pages in the FSM.
> > > - B2, B3, ... B11 wake up and fetch from the FSM. In each of those
> > > backends, fsm_does_block_exist() returns false, because the cached
> > > relation
> > > size is stale. Those pages remain unused until VACUUM re-records them
> > > in
> > > the FSM.
> > >
> > > PostgreSQL added the multiple-block extension logic (commit 719c84c)
> > > from
> > > hard-won experience bottlenecking there. If fixing $SUBJECT will lose
> > > 1% of that change's benefit, that's probably okay. If it loses 20%, we
> > > should work to do better. While we could rerun the 2016 benchmark to
> > > see how the patch regresses it, there might be a simple fix.
> > > fsm_does_block_exist()> >
> > > could skip the cache when blknumber>=cached, like this:
> > > return((BlockNumberIsValid(smgr-
>smgr_cached_nblocks[MAIN_FORKNUM])
> >
> > &&
> >
> > > blknumber < smgr-
> > >
> > >smgr_cached_nblocks[MAIN_FORKNUM]) ||
> > >
> > > blknumber < RelationGetNumberOfBlocks(rel));
> > >
> > > How do you see it? That still leaves us with an avoidable lseek in B2,
> > > B3,
> > > ... B11, but that feels tolerable. I don't know how to do better
> > > without
> > > switching to the ReadBufferMode approach.
> >
> > That makes sense. I'm a bit worried about triggering additional lseeks
> > when we in fact have other free pages in the map, that would pass the
> > test. I'm not sure how relevant it is given the way we search the FSM
> > with fp_next_slot though...
>
> That's a reasonable thing to worry about. We could do wrong by trying too
> hard to use an FSM slot, and we could do wrong by not trying hard enough.
>
> > To address that, I've given a bit of thought about enabling / disabling
> > the
> > auto-repair behaviour with a flag in GetPageWithFreeSpace to distinguish
> > the cases where we know we have a somewhat up-to-date value compared to
> > the case where we don't (as in, for heap, try without repair, then get an
> > uptodate value to try the last block, and if we need to look at the FSM
> > again then ask for it to be repaired) but it brings way too much
> > complexity and would need careful thought for each AM.
> >
> > So please find attached a patch with the change you propose.
> >
> > Do you have a link to the benchmark you mention though to evaluate the
> > patch against it ?
>
> Here is one from the thread that created commit 719c84c:
> https://www.postgresql.org/message-id/flat/CA%2BTgmob7xED4AhoqLspSOF0wCMYEom
> gHfuVdzNJnwWVoE_c60g%40mail.gmail.com
>
> There may be other benchmarks earlier in that thread.
Thank you. I tried to run that benchmark, with 4 clients all copying 10M
pgbench_accounts row concurrently. With and without the patch, I ran the
benchmark 5 times, and took the average.
master: 15.05s
patched: 15.24s (+1.25%)
If I remove the best and worst run for each of those, the difference falls at
+0.7%.
From | Date | Subject | |
---|---|---|---|
Next Message | PG Bug reporting form | 2024-03-15 07:40:21 | BUG #18395: Checksum verification failed for: deb_postgis_3_4_pg16.app.zip |
Previous Message | Bruce Momjian | 2024-03-15 02:36:10 | Re: BUG #18387: Erroneous permission checks and/or misleading error messages with refresh materialized view |