From: | Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> |
---|---|
To: | Michael Paquier <michael(dot)paquier(at)gmail(dot)com> |
Cc: | pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: FSM corruption leading to errors |
Date: | 2016-10-10 14:41:16 |
Message-ID: | CABOikdM5rw=25qQc+wZoYN5yym2r09Q9X0Ria4_P48CGeCRU_g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Oct 10, 2016 at 7:55 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:
>
>
> + /*
> + * See comments in GetPageWithFreeSpace about handling outside the
> valid
> + * range blocks
> + */
> + nblocks = RelationGetNumberOfBlocks(rel);
> + while (target_block >= nblocks && target_block != InvalidBlockNumber)
> + {
> + target_block = RecordAndGetPageWithFreeSpace(rel, target_block, 0,
> + spaceNeeded);
> + }
> Hm. This is just a workaround. Even if things are done this way the
> FSM will remain corrupted.
No, because the code above updates the FSM of those out-of-the range
blocks. But now that I look at it again, may be this is not correct and it
may get into an endless loop if the relation is repeatedly extended
concurrently.
> And isn't that going to break once the
> relation is extended again?
Once the underlying bug is fixed, I don't see why it should break again. I
added the above code to mostly deal with already corrupt FSMs. May be we
can just document and leave it to the user to run some correctness checks
(see below), especially given that the code is not cheap and adds overheads
for everybody, irrespective of whether they have or will ever have corrupt
FSM.
> I'd suggest instead putting in the release
> notes a query that allows one to analyze what are the relations broken
> and directly have them fixed. That's annoying, but it would be really
> better than a workaround. One idea here is to use pg_freespace() and
> see if it returns a non-zero value for an out-of-range block on a
> standby.
>
>
Right, that's how I tested for broken FSMs. A challenge with any such query
is that if the shared buffer copy of the FSM page is intact, then the query
won't return problematic FSMs. Of course, if the fix is applied to the
standby and is restarted, then corrupt FSMs can be detected.
>
> At the same time, I have translated your script into a TAP test, I
> found that more useful when testing..
>
> Thanks for doing that.
Thanks,
Pavan
--
Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2016-10-10 14:44:57 | Re: autonomous transactions |
Previous Message | Michael Paquier | 2016-10-10 14:29:21 | Re: FSM corruption leading to errors |