From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: New FSM patch |
Date: | 2008-09-04 08:07:19 |
Message-ID: | 48BF9737.2050206@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Thanks for the review!
Simon Riggs wrote:
> Can I check some aspects of this related to Hot Standby? Some of them
> sound obvious, but worth double checking.
>
> * There will be no need to read FSM by any normal operation of a
> read-only transaction, so locking correctness considerations can
> possibly be ignored during recovery.
Correct.
A HOT prune operation doesn't currently update the FSM, but if it did,
that would be a case of a read-only transaction updating the FSM. But we
can't prune anyway in a hot standby.
> pg_freespacemap exists though:
> would we need to prevent that from executing during recovery, or will
> the FSM be fully readable? i.e. does redo take appropriate locks already
> (I don't see any Cleanup locks being required).
pg_freespacemap, the contrib module? Yeah, the FSM should be fully readable.
During normal operation, when a bottom level page is updated, and the
update needs to be bubbled up, the upper level page is pinned and locked
before the lock on the lower level page is released. That interlocking
is not done during WAL replay, and the lock on the lower level page is
released before locking the upper page. It's not required during WAL
replay, as there's no concurrent updates to the FSM.
> * FSM will be continuously maintained during recovery, so FSM will now
> be correct and immediately available when recovery completes?
Correct,
> * There are no cases where a screwed-up FSM will crash either recovery
> (FATAL+) or halt normal operation (PANIC)?
Hmm, there's no explicit elog(FATAL/PANIC) calls, but if the FSM is
really corrupt, you can probably get a segfault. That should be fixable
by adding more sanity checks, though.
> * incomplete action cleanup is fairly cheap and doesn't rely on the FSM
> being searchable to correct the error? This last is a hard one...
Correct.
> Do we have the concept of a invalid/corrupt FSM? What happens if the
> logic goes wrong and we have a corrupt page? Will that mean we can't
> complete actions against the heap?
Some scenarios I can think of:
Scenario: The binary tree on a page is corrupt, so that the value of an
upper node is < Max(leftchild, rightchild).
Consequence: Searchers won't see the free space below that node, and
will look elsewhere.
Scenario: The binary tree on a page is corrupt, so that the value of an
upper node is > Max(leftchild, rightchild).
Consequence: Searchers will notice the corruption while trying to
traverse down that path, and throw an elog(WARNING) in search_avail().
fsm_search will retry the search, and will in worst case go into an
infinite loop. That's obviously not good. We could automatically fix the
upper nodes of the tree, but that would wipe evidence that would be
useful in debugging.
Scenario: An upper level page is corrupt, claiming that there's no free
space on a lower level page, while there actually is. (the opposite,
where an upper level page claims that there *is* free space on a lower
level page, while there actually isn't, is now normal. The searcher will
update the upper level page in that case)
Consequence: A searcher won't see that free space, and will look elsewhere.
Scenario: An upper level page is corrupt, claiming that there is free
space on a lower level page that doesn't exist.
Consequence: Searchers will elog(ERROR), trying to read the non-existent
FSM page.
The 3rd scenario would lead to heap inserts/updates failing. We could
avoid that by checking that the page exists with
RelationGetNumberOfBlocks(), but I'm not sure if it's worth the cost.
> Are there really any changes to these files?
> src/include/storage/bufmgr.h
> src/include/postmaster/bgwriter.h
Hmm, apparently not.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Simon Riggs | 2008-09-04 08:37:27 | StartupCLOG |
Previous Message | Dimitri Fontaine | 2008-09-04 07:57:06 | Re: [PATCH] Cleanup of GUC units code |