From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: B-tree parent pointer and checkpoints |
Date: | 2010-11-08 13:40:10 |
Message-ID: | 4CD7FDBA.1020506@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 02.11.2010 16:40, Heikki Linnakangas wrote:
> On 02.11.2010 16:30, Tom Lane wrote:
>> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>>> I think we can fix this by requiring that any multi-WAL-record actions
>>> that are in-progress when a checkpoint starts (at the REDO-pointer) must
>>> finish before the checkpoint record is written.
>>
>> What happens if someone wants to start a new split while the checkpoint
>> is hanging fire?
>
> You mean after CreateCheckPoint has determined the redo pointer, but
> before it has written the checkpoint record? The new split can go ahead,
> and the checkpoint doesn't need care about it. Recovery will start at
> the redo pointer, so it will see the split record, and will know to
> finish the incomplete split if necessary.
>
> The logic is the same as with inCommit. Checkpoint will fetch the list
> of in-progress splits some time after determining the redo-pointer. It
> will then wait until all of those splits have finished. Any new splits
> that begin after fetching the list don't affect the checkpoint.
>
> inCommit can't be used as is, because it's tied to the Xid, but
> something similar should work.
Here's a first draft of this, using the inCommit flag as is. It works,
but suffers from starvation if you have a lot of concurrent
multi-WAL-record actions. I tested that by running INSERTs to a table
with tsvector field with a GiST index on it from five concurrent
sessions, and saw checkpoints regularly busy-waiting for over a minute.
To avoid that, we need something a little bit more complicated than a
boolean flag. I'm thinking of adding a counter beside the inCommit flag
that's incremented every time a new multi-WAL-record action begins, so
that the checkpoint process can distinguish between a new action that
was started after deciding the REDO pointer and an old one that's still
running.
(inCommit is a misnomer now, of course. Will need to find a better name..)
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
split-delay-checkpoint-1.patch | text/x-diff | 14.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Aidan Van Dyk | 2010-11-08 15:10:18 | Re: Protecting against unexpected zero-pages: proposal |
Previous Message | Shigeru HANADA | 2010-11-08 12:08:26 | Re: SQL/MED estimated time of arrival? |