Quick Links

Re: Partitioned checkpointing

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc:	Takashi Horikawa <t-horikawa(at)aj(dot)jp(dot)nec(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Partitioned checkpointing
Date:	2015-09-11 13:56:30
Message-ID:	CANP8+jKHDrwDD5Qc4dRYo2mNKoeLkTvF7QFDbnh0oiqAfVZ67A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 11 September 2015 at 09:07, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:

> Some general comments :
>

Thanks for the summary Fabien.

> I understand that what this patch does is cutting the checkpoint of
> buffers in 16 partitions, each addressing 1/16 of buffers, and each with
> its own wal-log entry, pacing, fsync and so on.
>
> I'm not sure why it would be much better, although I agree that it may
> have some small positive influence on performance, but I'm afraid it may
> also degrade performance in some conditions. So I think that maybe a better
> understanding of why there is a better performance and focus on that could
> help obtain a more systematic gain.
>

I think its a good idea to partition the checkpoint, but not doing it this
way.

Splitting with N=16 does nothing to guarantee the partitions are equally
sized, so there would likely be an imbalance that would reduce the
effectiveness of the patch.

> This method interacts with the current proposal to improve the
> checkpointer behavior by avoiding random I/Os, but it could be combined.
>
> I'm wondering whether the benefit you see are linked to the file flushing
> behavior induced by fsyncing more often, in which case it is quite close
> the "flushing" part of the current "checkpoint continuous flushing" patch,
> and could be redundant/less efficient that what is done there, especially
> as test have shown that the effect of flushing is *much* better on sorted
> buffers.
>
> Another proposal around, suggested by Andres Freund I think, is that
> checkpoint could fsync files while checkpointing and not wait for the end
> of the checkpoint. I think that it may also be one of the reason why your
> patch does bring benefit, but Andres approach would be more systematic,
> because there would be no need to fsync files several time (basically your
> patch issues 16 fsync per file). This suggest that the "partitionning"
> should be done at a lower level, from within the CheckPointBuffers, which
> would take care of fsyncing files some time after writting buffers to them
> is finished.

The idea to do a partial pass through shared buffers and only write a
fraction of dirty buffers, then fsync them is a good one.

The key point is that we spread out the fsyncs across the whole checkpoint
period.

I think we should be writing out all buffers for a particular file in one
pass, then issue one fsync per file. >1 fsyncs per file seems a bad idea.

So we'd need logic like this
1. Run through shared buffers and analyze the files contained in there
2. Assign files to one of N batches so we can make N roughly equal sized
mini-checkpoints
3. Make N passes through shared buffers, writing out files assigned to each
batch as we go

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: Partitioned checkpointing at 2015-09-11 08:07:36 from Fabien COELHO

Responses

Re: Partitioned checkpointing at 2015-09-11 15:54:46 from Tomas Vondra
Re: Partitioned checkpointing at 2015-09-11 16:28:30 from Fabien COELHO
Re: Partitioned checkpointing at 2015-09-12 03:49:53 from Takashi Horikawa

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2015-09-11 13:58:13	Re: Double linking MemoryContext children
Previous Message	Jan Wieck	2015-09-11 13:52:58	Re: Double linking MemoryContext children