Re: heavily contended lwlocks with long wait queues scale badly

From: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: heavily contended lwlocks with long wait queues scale badly
Date: 2024-01-17 04:24:49
Message-ID: bb410d95-91d2-4c84-986c-7009d5477dd0@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/16/24 1:11 AM, Michael Paquier wrote:
> On Thu, Jan 11, 2024 at 09:47:33AM -0500, Jonathan S. Katz wrote:
>> I have similar data sources to Nathan/Michael and I'm trying to avoid piling
>> on, but one case that's interesting occurred after a major version upgrade
>> from PG10 to PG14 on a database supporting a very active/highly concurrent
>> workload. On inspection, it seems like backpatching would help this
>> particularly case.
>>
>> With 10/11 EOL, I do wonder if we'll see more of these reports on upgrade to
>> < PG16.
>>
>> (I was in favor of backpatching prior; opinion is unchanged).
>
> Hearing nothing, I have prepared a set of patches for v12~v15,
> checking all the lwlock paths for all the branches. At the end the
> set of changes look rather sane to me regarding the queue handlings.
>
> I have also run some numbers on all the branches, and the test case
> posted upthread falls off dramatically after 512 concurrent
> connections at the top of all the stable branches :(
>
> For example on REL_12_STABLE with and without the patch attached:
> num v12 v12+patch
> 1 29717.151665 29096.707588
> 2 63257.709301 61889.476318
> 4 127921.873393 124575.901330
> 8 231400.571662 230562.725174
> 16 343911.185351 312432.897015
> 32 291748.985280 281011.787701
> 64 268998.728648 269975.605115
> 128 297332.597018 286449.176950
> 256 243902.817657 240559.122309
> 512 190069.602270 194510.718508
> 768 58915.650225 165714.707198
> 1024 39920.950552 149433.836901
> 2048 16922.391688 108164.301054
> 4096 6229.063321 69032.338708
>
> I'd like to apply that, just let me know if you have any comments
> and/or objections.

Wow. All I can say is that my opinion remains unchanged on going forward
with backpatching.

Looking at the code, I understand an argument for not backpatching given
we modify the struct, but this does seem low-risk/high-reward and should
help PostgreSQL to run better on this higher throughput workloads.

Thanks,

Jonathan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-01-17 04:33:10 Re: Synchronizing slots from primary to standby
Previous Message Yongtao Huang 2024-01-17 04:18:12 Re: Fix a typo of func DecodeInsert()