From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | "k(dot)jamison(at)fujitsu(dot)com" <k(dot)jamison(at)fujitsu(dot)com> |
Cc: | 'Robert Haas' <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [Patch] Optimize dropping of relation buffers using dlist |
Date: | 2019-11-12 19:19:33 |
Message-ID: | 20191112191933.g2ti5ulqurojopsu@development |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Nov 12, 2019 at 10:49:49AM +0000, k(dot)jamison(at)fujitsu(dot)com wrote:
>On Thurs, November 7, 2019 1:27 AM (GMT+9), Robert Haas wrote:
>> On Tue, Nov 5, 2019 at 10:34 AM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
>> wrote:
>> > 2) This adds another hashtable maintenance to BufferAlloc etc. but
>> > you've only done tests / benchmark for the case this optimizes. I
>> > think we need to see a benchmark for workload that allocates and
>> > invalidates lot of buffers. A pgbench with a workload that fits into
>> > RAM but not into shared buffers would be interesting.
>>
>> Yeah, it seems pretty hard to believe that this won't be bad for some workloads.
>> Not only do you have the overhead of the hash table operations, but you also
>> have locking overhead around that. A whole new set of LWLocks where you have
>> to take and release one of them every time you allocate or invalidate a buffer
>> seems likely to cause a pretty substantial contention problem.
>
>I'm sorry for the late reply. Thank you Tomas and Robert for checking this patch.
>Attached is the v3 of the patch.
>- I moved the unnecessary items from buf_internals.h to cached_buf.c since most of
> of those items are only used in that file.
>- Fixed the bug of v2. Seems to pass both RT and TAP test now
>
>Thanks for the advice on benchmark test. Please refer below for test and results.
>
>[Machine spec]
>CPU: 16, Number of cores per socket: 8
>RHEL6.5, Memory: 240GB
>
>scale: 3125 (about 46GB DB size)
>shared_buffers = 8GB
>
>[workload that fits into RAM but not into shared buffers]
>pgbench -i -s 3125 cachetest
>pgbench -c 16 -j 8 -T 600 cachetest
>
>[Patched]
>scaling factor: 3125
>query mode: simple
>number of clients: 16
>number of threads: 8
>duration: 600 s
>number of transactions actually processed: 8815123
>latency average = 1.089 ms
>tps = 14691.436343 (including connections establishing)
>tps = 14691.482714 (excluding connections establishing)
>
>[Master/Unpatched]
>...
>number of transactions actually processed: 8852327
>latency average = 1.084 ms
>tps = 14753.814648 (including connections establishing)
>tps = 14753.861589 (excluding connections establishing)
>
>
>My patch caused a little overhead of about 0.42-0.46%, which I think is small.
>Kindly let me know your opinions/comments about the patch or tests, etc.
>
Now try measuring that with a read-only workload, with prepared
statements. I've tried that on a machine with 16 cores, doing
# 16 clients
pgbench -n -S -j 16 -c 16 -M prepared -T 60 test
# 1 client
pgbench -n -S -c 1 -M prepared -T 60 test
and average from 30 runs of each looks like this:
# clients master patched %
---------------------------------------------------------
1 29690 27833 93.7%
16 300935 283383 94.1%
That's quite significant regression, considering it's optimizing an
operation that is expected to be pretty rare (people are generally not
dropping dropping objects as often as they query them).
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2019-11-12 19:27:16 | Re: Coding in WalSndWaitForWal |
Previous Message | Andres Freund | 2019-11-12 19:00:46 | Re: Proposal: Add more compile-time asserts to expose inconsistencies. |