From: | Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: checkpointer continuous flushing |
Date: | 2015-06-24 06:29:04 |
Message-ID: | alpine.DEB.2.10.1506240628160.3535@sto |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
>> Besides, causing additional cacheline bouncing during the
>> sorting process is a bad idea.
>
> Hmmm. The impact would be to multiply the memory required by 3 or 4 (buf_id,
> relation, forknum, offset), instead of just buf_id, and I understood that
> memory was a concern.
>
> Moreover, once the sort process get the lines which contain the sorting data
> from the buffer descriptor in its cache, I think that it should be pretty
> much okay. Incidentally, they would probably have been brought to cache by
> the scan to collect them. Also, I do not think that the sorting time for
> 128000 buffers, and possible cache misses, was a big issue, but I do not have
> a measure to defend that. I could try to collect some data about that.
I've collected some data by adding a "sort time" measure, with
checkpoint_sort_size=10000000 so that sorting is in one chunk, and done
some large checkpoints:
LOG: checkpoint complete: wrote 41091 buffers (6.3%);
0 transaction log file(s) added, 0 removed, 0 recycled;
sort=0.024 s, write=0.488 s, sync=8.790 s, total=9.837 s;
sync files=41, longest=8.717 s, average=0.214 s;
distance=404972 kB, estimate=404972 kB
LOG: checkpoint complete: wrote 212124 buffers (32.4%);
0 transaction log file(s) added, 0 removed, 0 recycled;
sort=0.078 s, write=128.885 s, sync=1.269 s, total=131.646 s;
sync files=43, longest=1.155 s, average=0.029 s;
distance=2102950 kB, estimate=2102950 kB
LOG: checkpoint complete: wrote 384427 buffers (36.7%);
0 transaction log file(s) added, 0 removed, 1 recycled;
sort=0.120 s, write=83.995 s, sync=13.944 s, total=98.035 s;
sync files=9, longest=13.724 s, average=1.549 s;
distance=3783305 kB, estimate=3783305 kB
LOG: checkpoint complete: wrote 809211 buffers (77.2%);
0 transaction log file(s) added, 0 removed, 1 recycled;
sort=0.358 s, write=138.146 s, sync=14.943 s, total=153.124 s;
sync files=13, longest=14.871 s, average=1.149 s;
distance=8075338 kB, estimate=8075338 kB
Summary of these checkpoints:
#buffers size sort
41091 328MB 0.024
212124 1.7GB 0.078
384427 2.9GB 0.120
809211 6.2GB 0.358
Sort times are pretty negligeable compared to the whole checkpoint time,
and under 0.1 s/GB of buffers sorted.
On a 512 GB server with shared_buffers=128GB (25%), this suggest a worst
case checkpoint sorting in a few seconds, and then you have a hundred GB
to write anyway. If we project on next decade 1 TB checkpoint that would
make sorting in under a minute... But then you have 1 TB of data to dump.
As a comparison point, I've done the large checkpoint with the default
sort size of 131072:
LOG: checkpoint complete: wrote 809211 buffers (77.2%);
0 transaction log file(s) added, 0 removed, 1 recycled;
sort=0.251 s, write=152.377 s, sync=15.062 s, total=167.453 s;
sync files=13, longest=14.974 s, average=1.158 s;
distance=8075338 kB, estimate=8075338 kB
The 0.251 sort time is to be compared to 0.358. Well, n.log(n) is not too
bad, as expected.
These figures suggest that sorting time and associated cache misses are
not a significant issue and thus are not worth bothering much about, and
also that probably a simple boolean option would be quite acceptable
instead of the chunk approach.
Attached is an updated version of the patch which turns the sort option
into a boolean, and also include the sort time in the checkpoint log.
There is still an open question about whether the sorting buffer
allocation is lost on some signals and should be reallocated in such
event.
--
Fabien.
Attachment | Content-Type | Size |
---|---|---|
checkpoint-continuous-flush-4.patch | text/x-diff | 42.5 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2015-06-24 06:43:35 | Re: pg_rewind failure by file deletion in source server |
Previous Message | Fabien COELHO | 2015-06-24 04:26:04 | Re: checkpointer continuous flushing |