Re: heavily contended lwlocks with long wait queues scale badly

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>
Cc: Nathan Bossart <nathandbossart(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>
Subject: Re: heavily contended lwlocks with long wait queues scale badly
Date: 2024-01-16 06:11:48
Message-ID: ZaYeJDWKxbUIzvUA@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 11, 2024 at 09:47:33AM -0500, Jonathan S. Katz wrote:
> I have similar data sources to Nathan/Michael and I'm trying to avoid piling
> on, but one case that's interesting occurred after a major version upgrade
> from PG10 to PG14 on a database supporting a very active/highly concurrent
> workload. On inspection, it seems like backpatching would help this
> particularly case.
>
> With 10/11 EOL, I do wonder if we'll see more of these reports on upgrade to
> < PG16.
>
> (I was in favor of backpatching prior; opinion is unchanged).

Hearing nothing, I have prepared a set of patches for v12~v15,
checking all the lwlock paths for all the branches. At the end the
set of changes look rather sane to me regarding the queue handlings.

I have also run some numbers on all the branches, and the test case
posted upthread falls off dramatically after 512 concurrent
connections at the top of all the stable branches :(

For example on REL_12_STABLE with and without the patch attached:
num v12 v12+patch
1 29717.151665 29096.707588
2 63257.709301 61889.476318
4 127921.873393 124575.901330
8 231400.571662 230562.725174
16 343911.185351 312432.897015
32 291748.985280 281011.787701
64 268998.728648 269975.605115
128 297332.597018 286449.176950
256 243902.817657 240559.122309
512 190069.602270 194510.718508
768 58915.650225 165714.707198
1024 39920.950552 149433.836901
2048 16922.391688 108164.301054
4096 6229.063321 69032.338708

I'd like to apply that, just let me know if you have any comments
and/or objections.
--
Michael

Attachment Content-Type Size
0001-lwlock-Fix-quadratic-behavior-with-very-long-wai.v15.patch text/x-diff 9.1 KB
0001-lwlock-Fix-quadratic-behavior-with-very-long-wai.v14.patch text/x-diff 9.0 KB
0001-lwlock-Fix-quadratic-behavior-with-very-long-wai.v13.patch text/x-diff 9.0 KB
0001-lwlock-Fix-quadratic-behavior-with-very-long-wai.v12.patch text/x-diff 9.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Japin Li 2024-01-16 06:15:21 Re: Add test module for Table Access Method
Previous Message Bharath Rupireddy 2024-01-16 06:08:43 Re: minor replication slot docs edits