From: | Simon Riggs <simon(at)2ndQuadrant(dot)com> |
---|---|
To: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
Cc: | Takashi Horikawa <t-horikawa(at)aj(dot)jp(dot)nec(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Partitioned checkpointing |
Date: | 2015-09-11 13:56:30 |
Message-ID: | CANP8+jKHDrwDD5Qc4dRYo2mNKoeLkTvF7QFDbnh0oiqAfVZ67A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 11 September 2015 at 09:07, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
> Some general comments :
>
Thanks for the summary Fabien.
> I understand that what this patch does is cutting the checkpoint of
> buffers in 16 partitions, each addressing 1/16 of buffers, and each with
> its own wal-log entry, pacing, fsync and so on.
>
> I'm not sure why it would be much better, although I agree that it may
> have some small positive influence on performance, but I'm afraid it may
> also degrade performance in some conditions. So I think that maybe a better
> understanding of why there is a better performance and focus on that could
> help obtain a more systematic gain.
>
I think its a good idea to partition the checkpoint, but not doing it this
way.
Splitting with N=16 does nothing to guarantee the partitions are equally
sized, so there would likely be an imbalance that would reduce the
effectiveness of the patch.
> This method interacts with the current proposal to improve the
> checkpointer behavior by avoiding random I/Os, but it could be combined.
>
> I'm wondering whether the benefit you see are linked to the file flushing
> behavior induced by fsyncing more often, in which case it is quite close
> the "flushing" part of the current "checkpoint continuous flushing" patch,
> and could be redundant/less efficient that what is done there, especially
> as test have shown that the effect of flushing is *much* better on sorted
> buffers.
>
> Another proposal around, suggested by Andres Freund I think, is that
> checkpoint could fsync files while checkpointing and not wait for the end
> of the checkpoint. I think that it may also be one of the reason why your
> patch does bring benefit, but Andres approach would be more systematic,
> because there would be no need to fsync files several time (basically your
> patch issues 16 fsync per file). This suggest that the "partitionning"
> should be done at a lower level, from within the CheckPointBuffers, which
> would take care of fsyncing files some time after writting buffers to them
> is finished.
The idea to do a partial pass through shared buffers and only write a
fraction of dirty buffers, then fsync them is a good one.
The key point is that we spread out the fsyncs across the whole checkpoint
period.
I think we should be writing out all buffers for a particular file in one
pass, then issue one fsync per file. >1 fsyncs per file seems a bad idea.
So we'd need logic like this
1. Run through shared buffers and analyze the files contained in there
2. Assign files to one of N batches so we can make N roughly equal sized
mini-checkpoints
3. Make N passes through shared buffers, writing out files assigned to each
batch as we go
--
Simon Riggs http://www.2ndQuadrant.com/
<http://www.2ndquadrant.com/>
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2015-09-11 13:58:13 | Re: Double linking MemoryContext children |
Previous Message | Jan Wieck | 2015-09-11 13:52:58 | Re: Double linking MemoryContext children |