From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | 周正中(德歌) <dege(dot)zzz(at)alibaba-inc(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, 张广舟(明虚) <guangzhou(dot)zgz(at)alibaba-inc(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, 范孝剑(康贤) <funnyxj(dot)fxj(at)alibaba-inc(dot)com>, 曾文旌(义从) <wenjing(dot)zwj(at)alibaba-inc(dot)com>, 窦贤明(执白) <xianming(dot)dxm(at)alibaba-inc(dot)com>, 萧少聪(铁庵) <shaocong(dot)xsc(at)alibaba-inc(dot)com>, 陈新坚(惧留孙) <xinjian(dot)chen(at)alibaba-inc(dot)com> |
Subject: | Re: [HACKERS] 答复:[HACKERS] 答复:[HACKERS] about fsync in CLOG buffer write |
Date: | 2015-09-09 14:27:06 |
Message-ID: | CA+TgmoZYEe3apby_nYMe1eTBY12VVud7ZA-ApJodksUpKeZ5Kw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Sep 8, 2015 at 2:28 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2015-09-08 15:58:26 +0800, 周正中(德歌) wrote:
>> postgres(at)digoal-> cat 7.sql
>> select txid_current();
>>
>> postgres(at)digoal-> pgbench -M prepared -n -r -P 1 -f ./7.sql -c 1 -j 1 -T 100000
>> About 32K tps.
>> progress: 240.0 s, 31164.4 tps, lat 0.031 ms stddev 0.183
>> progress: 241.0 s, 33243.3 tps, lat 0.029 ms stddev 0.127
>
> So you're benchmarking how fast you can assign txids. Is that actually
> something meaningful? If you have other writes interleaved you'll write
> much more WAL, so there'll be checkpoints and such.
>
> FWIW, if you measure something realistic and there's checkpoints,
> there'll be fewer fsyncs if you increase the slru buffer size - as
> there'll often be clean buffers due to the checkpoint having written
> them out.
But I think it's not very hard to come up with a workload where 32
clog buffers isn't enough, and you end up waiting for backends to
fsync(). Suppose you have a pgbench workload with N tuples. We
update tuples at random, so sometimes we hit one that's just recently
been updated, and other times we hit one that hasn't been updated for
many transactions. At 32k transactions per page, each transaction's
chance of updating a tuple whose existing xmin is on the most recent
clog page is 32k/N. The chance of hitting a tuple whose existing xmin
is on the next page back is (1 - 32k/N) * 32k/N. The chance of
hitting a page whose current xmin is at least X pages prior to the
most recent one is (1 - 32k/N)^X. So, how many tuples do we need in
order for each update to have a 50% chance of hitting a tuple that is
at least 32 pages back?
(1 - 32k/N)^32 = .5
1 - 32k/N = 0.9785720620877
32k/N = 0.0214279379122999
N = 32k/0.0214279379122999
N = 1529218.54329206
...or in other words, scale factor 16. At scale factors >= 1044, the
chance that the next update hits an xmin more than 32 clog buffers
back is > 99%. So any test of this sort causes extremely rapid clog
page eviction - basically every transaction is going to request a
buffer that's probably not cached, and as soon as it's done with it,
some other transaction will evict it to bring in a different buffer
that's not cached.
How often such a workload actually has to replace a *dirty* clog
buffer obviously depends on how often you checkpoint, but if you're
getting ~28k TPS you can completely fill 32 clog buffers (1 million
transactions) in less than 40 seconds, and you're probably not
checkpointing nearly that often.
I'm not entirely sure how much fsync absorption for SLRU buffers will
help, but I think it would be worth trying. Back when I was spending
more time on this area, I saw some evidence that those fsyncs did
cause at least some latency spikes.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2015-09-09 14:27:50 | Re: Counting lines correctly in psql help displays |
Previous Message | Pavel Stehule | 2015-09-09 14:10:02 | Re: [patch] Proposal for \rotate in psql |