From: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Nasty btree deletion bug |
Date: | 2006-10-26 15:00:15 |
Message-ID: | 4540CD7F.30909@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
> I wrote:
>> I've been analyzing Ed L's recent report of index corruption:
>> http://archives.postgresql.org/pgsql-general/2006-10/msg01183.php
Auch. That's nasty indeed.
> So I think the rule needs to be "don't delete the rightmost child unless
> it's the only child, in which case you can delete the parent too --- but
> the same restriction must be observed at the next level up".
> ....
> The concept of a half-dead page would remain, but it'd be a transient
> state that would normally only persist for a moment between atomic
> page-delete actions. If we crash between two such actions, the
> half-dead page would remain present, but would be found and cleaned up
> by the next VACUUM. In the meantime it wouldn't cause any problem
> because the keyspace it gives up will belong to a sibling of the same
> parent at whatever level the delete is ultimately supposed to stop at,
> and so inserts and even splits in that keyspace won't create an
> inconsistency.
I don't understand how this "in the meantime" thing works. I tried to
work out a step-by-step example, could you take a look at it? See
http://users.tkk.fi/~hlinnaka/pgsql/btree-deletion-bug/
> ...
>
> Comments? Have I missed anything?
It took me a lot of time with pen and paper to understand the issue. And
I'm not sure I still understood it fully. The logic is very complex,
which is bad for maintainability in itself :(.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2006-10-26 15:47:29 | Re: Nasty btree deletion bug |
Previous Message | Volkan YAZICI | 2006-10-26 14:52:27 | Re: pg_get_domaindef() |