From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Teodor Sigaev <teodor(at)sigaev(dot)ru>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Subject: | Re: B-tree parent pointer and checkpoints |
Date: | 2010-11-11 15:16:11 |
Message-ID: | 14543.1289488571@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> GiST is different. When you insert a key to a leaf page, you (sometimes)
> need to adjust the parent pointer to reflect the new key as well. B-tree
> tolerates incomplete splits with the 'next page' pointer, but that is
> not applicable to gist. Teodor described the issue back in 2005 when
> WAL-logging was added to GiST
> (http://archives.postgresql.org/pgsql-hackers/2005-06/msg00555.php)
> Reading that I wonder: what harm would an incomplete insert cause if we
> just left it in the tree? Imagine that you insert a key to a leaf page,
> but crash before updating the parent. If you search for the key starting
> from the root, you'll fail to find it, because the parent pointer claims
> that there are no entries with such a key on the child page. But that's
> OK, the inserting transaction aborted with the crash!
I think it'd be okay as far as that one entry is concerned, since as you
say it doesn't matter whether a search finds it. (We'd have to be sure
that VACUUM would still find it to remove it, of course, but that
doesn't use a normal search.) You're right that it poses a hazard of
subsequent inserts deciding that they don't need to do work on upper
levels because the lower ones look OK already. But depending on the
details of the search algorithm, this might be a non-problem: if you
remember that the upper level entries didn't cover your key when you
descended, you'd still know you need to recompute them.
Something else I just noticed is that WAL replay isn't capable of
completely fixing the index anyway:
* To complete insert we can't use basic insertion algorithm because
* during insertion we can't call user-defined support functions of opclass.
* So, we insert 'invalid' tuples without real key and do it by separate algorithm.
* 'invalid' tuple should be updated by vacuum full.
Given that there's no more vacuum full, and nobody has been expected to
run it routinely for a long time anyway, this fixup approach seems
pretty completely broken anyhow.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Aidan Van Dyk | 2010-11-11 15:17:15 | Re: renaming contrib. (was multi-platform, multi-locale regression tests) |
Previous Message | Yeb Havinga | 2010-11-11 15:10:12 | Re: BUG #5748: Invalid oidvector data during binary recv |