Re: Partitioned checkpointing

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Takashi Horikawa <t-horikawa(at)aj(dot)jp(dot)nec(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Partitioned checkpointing
Date: 2015-09-11 13:56:30
Message-ID: CANP8+jKHDrwDD5Qc4dRYo2mNKoeLkTvF7QFDbnh0oiqAfVZ67A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11 September 2015 at 09:07, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:

> Some general comments :
>

Thanks for the summary Fabien.

> I understand that what this patch does is cutting the checkpoint of
> buffers in 16 partitions, each addressing 1/16 of buffers, and each with
> its own wal-log entry, pacing, fsync and so on.
>
> I'm not sure why it would be much better, although I agree that it may
> have some small positive influence on performance, but I'm afraid it may
> also degrade performance in some conditions. So I think that maybe a better
> understanding of why there is a better performance and focus on that could
> help obtain a more systematic gain.
>

I think its a good idea to partition the checkpoint, but not doing it this
way.

Splitting with N=16 does nothing to guarantee the partitions are equally
sized, so there would likely be an imbalance that would reduce the
effectiveness of the patch.

> This method interacts with the current proposal to improve the
> checkpointer behavior by avoiding random I/Os, but it could be combined.
>
> I'm wondering whether the benefit you see are linked to the file flushing
> behavior induced by fsyncing more often, in which case it is quite close
> the "flushing" part of the current "checkpoint continuous flushing" patch,
> and could be redundant/less efficient that what is done there, especially
> as test have shown that the effect of flushing is *much* better on sorted
> buffers.
>
> Another proposal around, suggested by Andres Freund I think, is that
> checkpoint could fsync files while checkpointing and not wait for the end
> of the checkpoint. I think that it may also be one of the reason why your
> patch does bring benefit, but Andres approach would be more systematic,
> because there would be no need to fsync files several time (basically your
> patch issues 16 fsync per file). This suggest that the "partitionning"
> should be done at a lower level, from within the CheckPointBuffers, which
> would take care of fsyncing files some time after writting buffers to them
> is finished.

The idea to do a partial pass through shared buffers and only write a
fraction of dirty buffers, then fsync them is a good one.

The key point is that we spread out the fsyncs across the whole checkpoint
period.

I think we should be writing out all buffers for a particular file in one
pass, then issue one fsync per file. >1 fsyncs per file seems a bad idea.

So we'd need logic like this
1. Run through shared buffers and analyze the files contained in there
2. Assign files to one of N batches so we can make N roughly equal sized
mini-checkpoints
3. Make N passes through shared buffers, writing out files assigned to each
batch as we go

--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-09-11 13:58:13 Re: Double linking MemoryContext children
Previous Message Jan Wieck 2015-09-11 13:52:58 Re: Double linking MemoryContext children