Re: MultiXact\SLRU buffers configuration

From: Thom Brown <thom(at)linux(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, vignesh C <vignesh21(at)gmail(dot)com>, Andrew Borodin <amborodin86(at)gmail(dot)com>, Yura Sokolov <y(dot)sokolov(at)postgrespro(dot)ru>, Andres Freund <andres(at)anarazel(dot)de>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Gilles Darold <gilles(at)darold(dot)net>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers mailing list <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: MultiXact\SLRU buffers configuration
Date: 2024-10-29 16:38:28
Message-ID: CAA-aLv4h=58ggum1-ChLEH3STPXrC4K_cbqYxQ2i2aYMfdwLzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 23 Aug 2024 at 01:29, Michael Paquier <michael(at)paquier(dot)xyz> wrote:

> On Thu, Aug 22, 2024 at 10:36:38AM -0400, Alvaro Herrera wrote:
> > On 2024-Aug-22, Michael Paquier wrote:
> >> I'm not sure that we need to get down to that until somebody has a
> >> case where they want to rely on stats of injection points for their
> >> stuff. At this stage, I only want the stats to be enabled to provide
> >> automated checks for the custom pgstats APIs, so disabling it by
> >> default and enabling it only in the stats test of the module
> >> injection_points sounds kind of enough to me for now.
> >
> > Oh! I thought the stats were useful by themselves.
>
> Yep, currently they're not, but I don't want to discard that they'll
> never be, either. Perhaps there would be a case where somebody would
> like to run a callback N times and trigger a condition? That's
> something where the stats could be useful, but I don't have a specific
> case for that now. I'm just imagining possibilities.
>

I believe I am seeing the problem being discussed occuring on a production
system running 15.6, causing ever-increasing replay lag on the standby,
until I cancel the offending process on the standby and force it to process
its interrupts.

Here's the backtrace before I do that:

#0 0x00007f4503b81876 in select () from /lib64/libc.so.6
#1 0x0000558b0956891a in pg_usleep (microsec=microsec(at)entry=1000) at
pgsleep.c:56
#2 0x0000558b0917e01a in GetMultiXactIdMembers (from_pgupgrade=false,
onlyLock=<optimized out>, members=0x7ffcd2a9f1e0, multi=109187502) at
multixact.c:1392
#3 GetMultiXactIdMembers (multi=109187502,
members=members(at)entry=0x7ffcd2a9f1e0,
from_pgupgrade=from_pgupgrade(at)entry=false, onlyLock=onlyLock(at)entry=false)
at multixact.c:1224
#4 0x0000558b0913de15 in MultiXactIdGetUpdateXid (xmax=<optimized out>,
t_infomask=<optimized out>) at heapam.c:6924
#5 0x0000558b09146028 in HeapTupleGetUpdateXid
(tuple=tuple(at)entry=0x7f440d428308)
at heapam.c:6965
#6 0x0000558b0914c02f in HeapTupleSatisfiesMVCC (htup=0x558b0b7cbf20,
htup=0x558b0b7cbf20, buffer=8053429, snapshot=0x558b0b63a2d8) at
heapam_visibility.c:1089
#7 HeapTupleSatisfiesVisibility (tup=tup(at)entry=0x7ffcd2a9f2b0,
snapshot=snapshot(at)entry=0x558b0b63a2d8, buffer=buffer(at)entry=8053429) at
heapam_visibility.c:1771
#8 0x0000558b0913e819 in heapgetpage (sscan=sscan(at)entry=0x558b0b7ccfa0,
page=page(at)entry=115) at heapam.c:468
#9 0x0000558b0913eb7e in heapgettup_pagemode (scan=scan(at)entry=0x558b0b7ccfa0,
dir=ForwardScanDirection, nkeys=0, key=0x0) at heapam.c:1120
#10 0x0000558b0913fb5e in heap_getnextslot (sscan=0x558b0b7ccfa0,
direction=<optimized out>, slot=0x558b0b7cc000) at heapam.c:1352
#11 0x0000558b092c0e7a in table_scan_getnextslot (slot=0x558b0b7cc000,
direction=ForwardScanDirection, sscan=<optimized out>) at
../../../src/include/access/tableam.h:1046
#12 SeqNext (node=0x558b0b7cbe10) at nodeSeqscan.c:80
#13 0x0000558b0929b9bf in ExecScan (node=0x558b0b7cbe10,
accessMtd=0x558b092c0df0 <SeqNext>, recheckMtd=0x558b092c0dc0 <SeqRecheck>)
at execScan.c:198
#14 0x0000558b09292cb2 in ExecProcNode (node=0x558b0b7cbe10) at
../../../src/include/executor/executor.h:262
#15 ExecutePlan (execute_once=<optimized out>, dest=0x558b0bca1350,
direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>,
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x558b0b7cbe10, estate=0x558b0b7cbbe8)
at execMain.c:1636
#16 standard_ExecutorRun (queryDesc=0x558b0b8c9798, direction=<optimized
out>, count=0, execute_once=<optimized out>) at execMain.c:363
#17 0x00007f44f64d43c5 in pgss_ExecutorRun (queryDesc=0x558b0b8c9798,
direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at
pg_stat_statements.c:1010
#18 0x0000558b093fda0f in PortalRunSelect (portal=portal(at)entry=0x558b0b6ba458,
forward=forward(at)entry=true, count=0, count(at)entry=9223372036854775807,
dest=dest(at)entry=0x558b0bca1350) at pquery.c:924
#19 0x0000558b093fedb8 in PortalRun (portal=portal(at)entry=0x558b0b6ba458,
count=count(at)entry=9223372036854775807, isTopLevel=isTopLevel(at)entry=true,
run_once=run_once(at)entry=true, dest=dest(at)entry=0x558b0bca1350,
altdest=altdest(at)entry=0x558b0bca1350, qc=0x7ffcd2a9f7a0) at pquery.c:768
#20 0x0000558b093fb243 in exec_simple_query (
query_string=0x558b0b6170c8 "<redacted>")
at postgres.c:1250
#21 0x0000558b093fd412 in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at postgres.c:4598
#22 0x0000558b0937e170 in BackendRun (port=<optimized out>, port=<optimized
out>) at postmaster.c:4514
#23 BackendStartup (port=<optimized out>) at postmaster.c:4242
#24 ServerLoop () at postmaster.c:1809
#25 0x0000558b0937f147 in PostmasterMain (argc=argc(at)entry=5,
argv=argv(at)entry=0x558b0b5d0a30)
at postmaster.c:1481
#26 0x0000558b09100a2c in main (argc=5, argv=0x558b0b5d0a30) at main.c:202

This occurred twice, meaning 2 processes needed terminating.

Thom

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2024-10-29 16:40:21 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message Heikki Linnakangas 2024-10-29 16:33:53 Re: CSN snapshots in hot standby