Re: what to revert

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <kgrittn(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: what to revert
Date: 2016-05-10 18:51:02
Message-ID: fa0e77af-8156-6940-e214-95c78c7fb4c4@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 05/10/2016 07:36 PM, Robert Haas wrote:
> On Tue, May 10, 2016 at 12:31 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>> The following table shows the differences between the disabled and reverted
>> cases like this:
>>
>> sum('reverted' results with N clients)
>> ---------------------------------------- - 1.0
>> sum('disabled' results with N clients)
>>
>> for each scale/client count combination. So for example 4.83% means with a
>> single client on the smallest data set, the sum of the 5 runs for reverted
>> was about 1.0483x than for disabled.
>>
>> scale 1 16 32 64 128
>> 100 4.83% 2.84% 1.21% 1.16% 3.85%
>> 3000 1.97% 0.83% 1.78% 0.09% 7.70%
>> 10000 -6.94% -5.24% -12.98% -3.02% -8.78%
>
> /me scratches head.
>
> That doesn't seem like noise, but I don't understand the
> scale-factor-10000 results either. Reverting the patch makes the code
> smaller and removes instructions from critical paths, so it should
> speed things up at least nominally. The question is whether it makes
> enough difference that anyone cares. However, removing unused code
> shouldn't make the system *slower*, but that's what's happening here
> at the higher scale factor.

/me scratches head too

>
> I've seen cases where adding dummy instructions to critical paths
> slows things down at 1 client and speeds them up with many clients.
> That happens because the percentage of time active processes fighting
> over the critical locks goes down, which reduces contention more than
> enough to compensate for the cost of executing the dummy
> instructions. If your results showed performance lower at 1 client
> and slightly higher at many clients, I'd suspect an effect of that
> sort. But I can't see why it should depend on the scale factor. That
> suggests that, perhaps, it's having some effect on the impact of
> buffer eviction, maybe due to a difference in shared memory layout.
> But I thought we weren't supposed to have such artifacts any more
> now that we start every allocation on a cache line boundary...
>

I think we should look for issues in the testing procedure first,
perhaps try to reproduce it on a different system. Another possibility
is that the revert is not perfectly correct - the code compiles and does
not crash, but maybe there's a subtle issue somewhere.

I'll try to collect some additional info (detailed info from sar,
aggregated transaction log, ...) for further analysis. And also increase
the number of runs, so that we can better compare all the separate
combinations.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2016-05-10 19:15:47 Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
Previous Message Alvaro Herrera 2016-05-10 18:20:39 Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)