Quick Links

Re: B-tree parent pointer and checkpoints

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: B-tree parent pointer and checkpoints
Date:	2010-11-08 13:40:10
Message-ID:	4CD7FDBA.1020506@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 02.11.2010 16:40, Heikki Linnakangas wrote:
> On 02.11.2010 16:30, Tom Lane wrote:
>> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>>> I think we can fix this by requiring that any multi-WAL-record actions
>>> that are in-progress when a checkpoint starts (at the REDO-pointer) must
>>> finish before the checkpoint record is written.
>>
>> What happens if someone wants to start a new split while the checkpoint
>> is hanging fire?
>
> You mean after CreateCheckPoint has determined the redo pointer, but
> before it has written the checkpoint record? The new split can go ahead,
> and the checkpoint doesn't need care about it. Recovery will start at
> the redo pointer, so it will see the split record, and will know to
> finish the incomplete split if necessary.
>
> The logic is the same as with inCommit. Checkpoint will fetch the list
> of in-progress splits some time after determining the redo-pointer. It
> will then wait until all of those splits have finished. Any new splits
> that begin after fetching the list don't affect the checkpoint.
>
> inCommit can't be used as is, because it's tied to the Xid, but
> something similar should work.

Here's a first draft of this, using the inCommit flag as is. It works,
but suffers from starvation if you have a lot of concurrent
multi-WAL-record actions. I tested that by running INSERTs to a table
with tsvector field with a GiST index on it from five concurrent
sessions, and saw checkpoints regularly busy-waiting for over a minute.

To avoid that, we need something a little bit more complicated than a
boolean flag. I'm thinking of adding a counter beside the inCommit flag
that's incremented every time a new multi-WAL-record action begins, so
that the checkpoint process can distinguish between a new action that
was started after deciding the REDO pointer and an old one that's still
running.

(inCommit is a misnomer now, of course. Will need to find a better name..)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
split-delay-checkpoint-1.patch	text/x-diff	14.3 KB

In response to

Re: B-tree parent pointer and checkpoints at 2010-11-02 14:40:59 from Heikki Linnakangas

Responses

Re: B-tree parent pointer and checkpoints at 2010-11-10 18:58:07 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Aidan Van Dyk	2010-11-08 15:10:18	Re: Protecting against unexpected zero-pages: proposal
Previous Message	Shigeru HANADA	2010-11-08 12:08:26	Re: SQL/MED estimated time of arrival?