Re: Enable data checksums by default

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org,Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>,Christoph Berg <myon(at)debian(dot)org>,Bernd Helmle <mailings(at)oopsware(dot)de>,Michael Paquier <michael(at)paquier(dot)xyz>,Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Enable data checksums by default
Date: 2019-03-30 20:17:20
Message-ID: 23278C72-65BD-4EAE-85D4-15CC51E77DC9@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On March 30, 2019 3:25:43 PM EDT, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>On Fri, Mar 29, 2019 at 08:35:26PM +0100, Christoph Berg wrote:
>>Re: Bernd Helmle 2019-03-29
><3586bb9345a59bfc8d13a50a7c729be1ee6759fd(dot)camel(at)oopsware(dot)de>
>>> Am Freitag, den 29.03.2019, 23:10 +0900 schrieb Michael Paquier:
>>> >
>>> > I can't really believe that many people set up shared_buffers at
>>> > 128kB
>>> > which would cause such a large number of page evictions, but I can
>>> > believe that many users have shared_buffers set to its default
>value
>>> > and that we are going to get complains about "performance drop
>after
>>> > upgrade to v12" if we switch data checksums to on by default.
>>>
>>> Yeah, i think Christoph's benchmark is based on this thinking. I
>assume
>>> this very unrealistic scenery should emulate the worst case (many
>>> buffer_reads, high checksum calculation load).
>>
>>It's not unrealistic to have large seqscans that are all buffer
>>misses, the table just has to be big enough. The idea in my benchmark
>>was that if I make shared buffers really small, and the table still
>>fits in to RAM, I should be seeing only buffer misses, but without any
>>delay for actually reading from disk.
>>
>>Christoph
>>
>
>FWIW I think it's a mistake to focus solely on CPU utilization, which
>all the benchmarks performed on this thread do because they look at tps
>of in-memory read-only workloads. Checksums have other costs too, not
>just the additional CPU time. Most importanly they require
>wal_log_hints
>to be set (which people may or may not want anyway).
>
>I've done a simple benchmark, that does read-only (-S) and read-write
>(-N) pgbench runs with different scales, but also measures duration of
>the pgbench init and amount of WAL produced during the tests.
>
>On a small machine (i5, 8GB RAM, SSD RAID) the results are these:
>
> scale config | init tps wal
> =========================|==================================
> ro 10 no-hints | 2 117038 130
> hints | 2 116378 146
> checksums | 2 115619 147
> -------------------|----------------------------------
> 200 no-hints | 32 88340 2407
> hints | 37 86154 2628
> checksums | 36 83336 2624
> -------------------|----------------------------------
> 2000 no-hints | 365 38680 1967
> hints | 423 38670 2123
> checksums | 504 37510 2046
> -------------------------|----------------------------------
> rw 10 no-hints | 2 19691 437
> hints | 2 19712 437
> checksums | 2 19654 437
> -------------------|----------------------------------
> 200 no-hints | 32 15839 2745
> hints | 37 15735 2783
> checksums | 36 15646 2775
> -------------------|----------------------------------
> 2000 no-hints | 365 5371 3721
> hints | 423 5270 3671
> checksums | 504 5094 3574
>
>The no-hints config is default (wal_log_hints=off, data_checksums=off),
>hints sets wal_log_hints=on and checksums enables data checksums. All
>the configs were somewhat tuned (1GB shared buffers, max_wal_size high
>enough not to hit checkpoints very often, etc.).
>
>I've also done the tests on the a larger machine (2x E5-2620v4, 32GB of
>RAM, NVMe SSD), and the general pattern is about the same - while the
>tps and amount of WAL (not covering the init) does not change, the time
>for initialization increases significantly (by 20-40%).
>
>This effect is even clearer when using slower storage (SATA-based
>RAID).
>The results then look like this:
>
> scale config | init tps wal
> =========================|==================================
> ro 100 no-hints | 49 229459 122
> hints | 101 167983 190
> checksums | 103 156307 190
> -------------------|----------------------------------
> 1000 no-hints | 580 152167 109
> hints | 1047 122814 142
> checksums | 1080 118586 141
> -------------------|----------------------------------
> 6000 no-hints | 4035 508 1
> hints | 11193 502 1
> checksums | 11376 506 1
> -------------------------|----------------------------------
> rw 100 no-hints | 49 279 192
> hints | 101 275 190
> checksums | 103 275 190
> -------------------|----------------------------------
> 1000 no-hints | 580 237 210
> hints | 1047 225 201
> checksums | 1080 224 200
> -------------------|----------------------------------
> 6000 no-hints | 4035 135 123
> hints | 11193 133 122
> checksums | 11376 132 121
>
>and when expressed as relative to no-hints:
>
> scale config | init tps wal
> ============================|===============================
> ro 100 hints | 206% 73% 155%
> checksums | 210% 68% 155%
> -------------------|--------------------------------
> 1000 hints | 181% 81% 131%
> checksums | 186% 78% 129%
> -------------------|--------------------------------
> 6000 hints | 277% 99% 100%
> checksums | 282% 100% 104%
> ----------------------------|--------------------------------
> rw 100 hints | 206% 99% 99%
> checksums | 210% 99% 99%
> -------------------|--------------------------------
> 1000 hints | 181% 95% 96%
> checksums | 186% 95% 95%
> -------------------|--------------------------------
> 6000 hints | 277% 99% 99%
> checksums | 282% 98% 98%
>
>I have not investigated the exact reasons, but my hypothesis it's about
>the amount of WAL generated during the initial CREATE INDEX (because it
>probably ends up setting the hint bits), which puts additional pressure
>on the storage.
>
>Unfortunately, this additional cost is unlikely to go away :-(
>
>Now, maybe we want to enable checksums by default anyway, but we should
>not pretent the only cost related to checksums is CPU usage.

Thanks for running these, very helpful.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2019-03-30 20:42:16 Teach pg_upgrade test to honor NO_TEMP_INSTALL
Previous Message Tomas Vondra 2019-03-30 20:13:01 Re: [HACKERS] PATCH: multivariate histograms and MCV lists