Re: Speed up Clog Access by increasing CLOG buffers

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Speed up Clog Access by increasing CLOG buffers
Date: 2016-10-20 20:04:58
Message-ID: ecb99330-cdcc-dd53-983f-03f787c01fa4@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/20/2016 07:59 PM, Robert Haas wrote:
> On Thu, Oct 20, 2016 at 11:45 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, Oct 20, 2016 at 3:36 AM, Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>>> On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> ...
>>
>> So here's my theory. The whole reason why Tomas is having difficulty
>> seeing any big effect from these patches is because he's testing on
>> x86. When Dilip tests on x86, he doesn't see a big effect either,
>> regardless of workload. But when Dilip tests on POWER, which I think
>> is where he's mostly been testing, he sees a huge effect, because for
>> some reason POWER has major problems with this lock that don't exist
>> on x86.
>>
>> If that's so, then we ought to be able to reproduce the big gains on
>> hydra, a community POWER server. In fact, I think I'll go run a quick
>> test over there right now...
>
> And ... nope. I ran a 30-minute pgbench test on unpatched master
> using unlogged tables at scale factor 300 with 64 clients and got
> these results:
>
> 14 LWLockTranche | wal_insert
> 36 LWLockTranche | lock_manager
> 45 LWLockTranche | buffer_content
> 223 Lock | tuple
> 527 LWLockNamed | CLogControlLock
> 921 Lock | extend
> 1195 LWLockNamed | XidGenLock
> 1248 LWLockNamed | ProcArrayLock
> 3349 Lock | transactionid
> 85957 Client | ClientRead
> 135935 |
>
> I then started a run at 96 clients which I accidentally killed shortly
> before it was scheduled to finish, but the results are not much
> different; there is no hint of the runaway CLogControlLock contention
> that Dilip sees on power2.
>

What shared_buffer size were you using? I assume the data set fit into
shared buffers, right?

FWIW as I explained in the lengthy post earlier today, I can actually
reproduce the significant CLogControlLock contention (and the patches do
reduce it), even on x86_64.

For example consider these two tests:

* http://tvondra.bitbucket.org/#dilip-300-unlogged-sync
* http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip

However, it seems I can also reproduce fairly bad regressions, like for
example this case with data set exceeding shared_buffers:

* http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2016-10-20 20:16:04 Re: emergency outage requiring database restart
Previous Message Tom Lane 2016-10-20 19:46:20 Re: Renaming of pg_xlog and pg_clog