Re: checkpointer continuous flushing

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: checkpointer continuous flushing
Date: 2016-01-12 12:54:21
Message-ID: alpine.DEB.2.10.1601121243160.26748@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello Andres,

Thanks for the details. Many comments and some questions below.

>> Also, maybe you could answer a question I had about the performance
>> regression you observed, I could not find the post where you gave the
>> detailed information about it, so that I could try reproducing it: what are
>> the exact settings and conditions (shared_buffers, pgbench scaling, host
>> memory, ...), what is the observed regression (tps? other?), and what is the
>> responsiveness of the database under the regression (eg % of seconds with 0
>> tps for instance, or something like that).
>
> I measured it in a different number of cases, both on SSDs
> and spinning rust.

Argh! This is a key point: the sort/flush is designed to help HDDs, and
would have limited effect on SSDs, and it seems that you are showing that
the effect is in fact negative on SSDs, too bad:-(

The bad news is that I do not have a host with a SSD available for
reproducing such results.

On SSDs, the linux IO scheduler works quite well, so this is a place where
I would consider simply disactivating flushing and/or sorting.

ISTM that I would rather update the documentation to "do not activate on
SSD" than try to find a miraculous solution which may or may not exist.
Basically I would use your results to give better advises in the
documentation, not as a motivation to rewrite the patch from scratch.

> postgres-ckpt14 \
> -D /srv/temp/pgdev-dev-800/ \
> -c maintenance_work_mem=2GB \
> -c fsync=on \
> -c synchronous_commit=off \

I'm not sure I like this one. I guess the intention is to focus on
checkpointer writes and reduce the impact of WAL writes. Why not.

> -c shared_buffers=2GB \
> -c wal_level=hot_standby \
> -c max_wal_senders=10 \
> -c max_wal_size=100GB \
> -c checkpoint_timeout=30s

That is a very short one, but the point is to exercise the checkpoint, so
why not.

> My laptop 1 EVO 840, 1 i7-4800MQ, 16GB ram:
> master:
> scaling factor: 800

The DB is probably about 12GB, so it fits in memory in the end, meaning
that there should be only write activity after some time? So this is not
really the case where it does not fit in memory, but it is large enough to
get mostly random IOs both in read & write, so why not.

> query mode: prepared
> number of clients: 16
> number of threads: 16
> duration: 300 s
> number of transactions actually processed: 1155733

Assuming one buffer accessed per transaction on average, and considering a
uniform random distribution, this means about 50% of pages actually loaded
in memory at the end of the run (1 - e(-1155766/800*2048)) (with 2048
pages per scale unit).

> latency average: 4.151 ms
> latency stddev: 8.712 ms
> tps = 3851.242965 (including connections establishing)
> tps = 3851.725856 (excluding connections establishing)

> ckpt-14 (flushing by backends disabled):

Is this comment refering to "synchronous_commit = off"?
I guess this is the same on master above, even if not written?

> [...] In neither case there are periods of 0 tps, but both have times of
> 1000 tps with noticeably increased latency.

Ok, but we are talking SSDs, things are not too bad, even if there are ups
and downs.

> The endresults are similar with a sane checkpoint timeout - the tests
> just take much longer to give meaningful results. Constantly running
> long tests on prosumer level SSDs isn't nice - I've now killed 5 SSDs
> with postgres testing...

Indeed. It wears out and costs, too bad:-(

> As you can see there's roughly a 30% performance regression on the
> slower SSD and a ~9% on the faster one. HDD results are similar (but I
> can't repeat on the laptop right now since the 2nd hdd is now an SSD).

Ok, that is what I would have expected, the larger the database, the
smaller the impact of sorting & flushin on SSDs. Now I would have hoped
that flushing would help get a more constant load even in this case, at
least this is what I measured in my tests. The closest to your setting
test I ran is scale=660, and the sort/flush got 400 tps vs 100 tps
without, with 30 minutes checkpoints, but HDDs do not compare to SSDs...

My overall comments about this SSD regression is that the patch is really
designed to make a difference for HDDs, so to advise not activate on SSDs
if there is a regression in such a case.

Now this is a little disappointing as on paper sorted writes should also
be slightly better on SSDs, but if the bench says the contrary, I have to
believe the bench:-)

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vladimir Sitnikov 2016-01-12 13:26:35 Re: Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102
Previous Message Michael Paquier 2016-01-12 12:53:25 Re: Speedup twophase transactions