From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su> |
Subject: | Re: GiST insert algorithm rewrite |
Date: | 2010-11-16 18:22:22 |
Message-ID: | 4CE2CBDE.2010408@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 16.11.2010 20:01, Tom Lane wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> 2. When a page is split, we mark the new left page with a flag to
>> indicate that the downlink for the page to the right hasn't been
>> inserted yet. When the downlink is inserted, the flag is cleared. Again
>> the purpose is to ensure that the tree is self-consistent at all times.
>> If we crash just after a page split, before the downlink is inserted,
>> scans will find the tuples on the right page by following the rightlink.
>> It's slightly less performant, but correct.
>
> The one thought that comes to mind is how does the flag business work
> after multiple splittings? That is, assume we have a page that has the
> flag set because of a previous crash. If we have to split either that
> page or its right sibling, what do we do with the flags?
As I mentioned in the README, the insertion algorithm finishes any
incomplete splits it sees before proceeding. AFAICS that should ensure
that the situation never arises where you try to split a page that
already has the flag set. Or its right sibling; such a page can only be
reached via the rightlink so you would see the page with the flag set first.
Hmm, there is one corner-case that I didin't think of before: One
backend splits a leaf page, and another backend concurrently splits the
parent of that leaf page. If for some reason the backend operating on
the parent page dies, releasing the locks, the other backend will see
the incomplete split when it walks up to insert the downlink. Although
it should be possible to handle that, I think we can simply give up on
inserting the downlink in that case, and leave that split incomplete as
well. It's still correct, and next insert that comes along will complete
the splits, from top to bottom.
BTW, I don't try to fix incomplete splits during vacuum in the patch.
That's perhaps a bit surprising, and probably would be easy to add, but
I left it out for now as it's not strictly necessary.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2010-11-16 18:28:50 | Re: Extensible executor nodes for preparation of SQL/MED |
Previous Message | Tom Lane | 2010-11-16 18:12:20 | Re: unlogged tables |