crash with synchronized_standby_slots

From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, Gabriele Bartolini <gabriele(dot)bartolini(at)enterprisedb(dot)com>
Subject: crash with synchronized_standby_slots
Date: 2024-11-28 12:16:18
Message-ID: 202411281216.sutbxtr6idnn@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Gabriele just reported a crash when changing synchronized_standby_slots
under SIGHUP and logging collector working. The problem apparently is
that validate_sync_standby_slots is run for the GUC check routine, and
it requires acquiring an LWLock; but syslogger doesn't have a PGPROC so
that doesn't work.

Apparently we already have a hack in place against the same problem in
postmaster (and elsewhere?), by testing that ReplicationSlotCtl is not
null. But that hack seems to be incomplete, as the crash here attests.

To reproduce, simply start with no synchronized_standby_slots setting
and logging_collector=on, then change the value in postgresql.conf and
reload.

One option to fix this might be as attached. The point here is that
processes without a PGPROC don't need or care to have a correct setting,
so we just skip verifying it on those. AFAICS this is more general than
the test for ReplicationSlotCtl, so I just removed that one.

Here's the backtrace Gabriele showed me.

[New LWP 29]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: freddie: logger '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 pg_atomic_read_u32_impl (ptr=0x7f0cf21a1f84) at ./build/../src/include/port/atomics/generic.h:48
Download failed: Invalid argument. Continuing without source file ./build/src/backend/storage/lmgr/./build/../src/include/port/atomics/generic.h.
48 ./build/../src/include/port/atomics/generic.h: No such file or directory.
#0 pg_atomic_read_u32_impl (ptr=0x7f0cf21a1f84) at ./build/../src/include/port/atomics/generic.h:48
No locals.
#1 pg_atomic_read_u32 (ptr=0x7f0cf21a1f84) at ./build/../src/include/port/atomics.h:237
No locals.
#2 LWLockAttemptLock (lock=lock(at)entry=0x7f0cf21a1f80, mode=mode(at)entry=LW_SHARED) at ./build/../src/backend/storage/lmgr/lwlock.c:796
old_state = <optimized out>
#3 0x0000557cbe97ba60 in LWLockAcquire (lock=0x7f0cf21a1f80, mode=mode(at)entry=LW_SHARED) at ./build/../src/backend/storage/lmgr/lwlock.c:1235
mustwait = <optimized out>
proc = 0x0
result = true
extraWaits = 0
__func__ = "LWLockAcquire"
#4 0x0000557cbe935919 in validate_sync_standby_slots (elemlist=0x7ffda5aa60d0, rawname=0x557cfd704c98 "_cnpg_freddie_2") at ./build/../src/backend/replication/slot.c:2443
ok = true
ok = <optimized out>
name = <optimized out>
name__outerloop = <optimized out>
name__state = {l = <optimized out>, i = <optimized out>}
slot = <optimized out>
#5 check_synchronized_standby_slots (newval=<optimized out>, extra=0x7ffda5aa6300, source=<optimized out>) at ./build/../src/backend/replication/slot.c:2494
rawname = 0x557cfd704c98 "_cnpg_freddie_2"
ptr = <optimized out>
elemlist = 0x557cfd7041c8
size = <optimized out>
ok = <optimized out>
config = <optimized out>
#6 0x0000557cbeae95de in call_string_check_hook (conf=conf(at)entry=0x557cbee2f8b0 <ConfigureNamesString+14000>, newval=newval(at)entry=0x7ffda5aa62f8, extra=extra(at)entry=0x7ffda5aa6300, source=source(at)entry=PGC_S_FILE, elevel=elevel(at)entry=12) at ./build/../src/backend/utils/misc/guc.c:6925
_save_exception_stack = 0x0
_save_context_stack = 0x0
_local_sigjmp_buf = {{__jmpbuf = {93994266851504, -6990900416883569069, 140727382860536, 12, 93994266851504, 93995316414744, -6990900416982135213, -3748856269321122221}, __mask_was_saved = 0, __saved_mask = {__val = {140727382860536, 140727382860272, 93995316305072, 139693771467072, 93994263427300, 4662694561863172096, 93994263387537, 140727382860536, 93994263505961, 93994266851504, 140727382860536, 140727382860336, 93994263421816, 140727382860536, 12, 93994266851504}}}}
_do_rethrow = false
result = true
__func__ = "call_string_check_hook"
#7 0x0000557cbeaec1ba in parse_and_validate_value (record=0x557cbee2f8b0 <ConfigureNamesString+14000>, name=0x557cfd765840 "synchronized_standby_slots", value=<optimized out>, source=PGC_S_FILE, elevel=12, newval=0x7ffda5aa62f8, newextra=0x7ffda5aa6300) at ./build/../src/backend/utils/misc/guc.c:3260
conf = 0x557cbee2f8b0 <ConfigureNamesString+14000>
__func__ = "parse_and_validate_value"
#8 0x0000557cbeaecd6f in set_config_with_handle (name=0x557cfd765840 "synchronized_standby_slots", handle=<optimized out>, value=0x557cfd765868 "_cnpg_freddie_2", context=PGC_SIGHUP, source=PGC_S_FILE, srole=10, action=GUC_ACTION_SET, changeVal=<optimized out>, elevel=12, is_reload=false) at ./build/../src/backend/utils/misc/guc.c:4013
conf = 0x557cbee2f8b0 <ConfigureNamesString+14000>
orig_context = PGC_SIGHUP
orig_source = PGC_S_FILE
orig_srole = 10
record = 0x557cbee2f8b0 <ConfigureNamesString+14000>
newval_union = {boolval = 24, intval = -42858216, realval = 4.6439856710502817e-310, stringval = 0x557cfd720918 "_cnpg_freddie_2", enumval = -42858216}
newextra = 0x0
prohibitValueChange = false
makeDefault = <optimized out>
__func__ = "set_config_with_handle"
#9 0x0000557cbeaef400 in set_config_option (is_reload=false, elevel=0, changeVal=true, action=GUC_ACTION_SET, source=PGC_S_FILE, context=PGC_SIGHUP, value=<optimized out>, name=<optimized out>) at ./build/../src/backend/utils/misc/guc.c:3363
srole = 10
srole = <optimized out>
#10 ProcessConfigFileInternal (context=context(at)entry=PGC_SIGHUP, applySettings=applySettings(at)entry=true, elevel=elevel(at)entry=13) at ./build/../src/backend/utils/misc/guc.c:557
pre_value = 0x0
scres = <optimized out>
error = false
applying = true
ConfFileWithError = 0x557cfd720100 "/var/lib/postgresql/data/pgdata/postgresql.conf"
item = 0x557cfd765880
head = 0x557cfd704248
tail = 0x557cfd7664c0
status = {hashp = 0x557cfd7219c8, curBucket = 512, curEntry = 0x0}
hentry = <optimized out>
bail_out = <optimized out>
__func__ = "ProcessConfigFileInternal"
#11 0x0000557cbeaf2e81 in ProcessConfigFile (context=context(at)entry=PGC_SIGHUP) at ./build/../src/backend/utils/misc/guc-file.l:152
elevel = 13
config_cxt = 0x557cfd7040d0
caller_cxt = 0x557cfd6fd800
#12 0x0000557cbe9001cc in SysLoggerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ./build/../src/backend/postmaster/syslogger.c:369
time_based_rotation = false
size_rotation_for = 0
cur_timeout = <optimized out>
event = {pos = 0, events = 1, fd = -1, user_data = 0x0}
rc = <optimized out>
logbuffer = "\000\000\300\000\034\000\000\000!2024-11-28 10:19:42.415 UTC,,,28,,6748439f.1c,8,,2024-11-28 10:19:11 UTC,,0,LOG,00000,\"parameter \"\"synchronous_standby_names\"\" changed to \"\"ANY 1 (\"\"freddie-2\"\")\"\"\",,,,,,,,,\"\",\"postmaster\",,0"...
bytes_in_logbuffer = 0
currentLogDir = 0x557cfd6fe6d8 "/controller/log"
currentLogFilename = 0x557cfd6fe6f0 "postgres"
currentLogRotationAge = 0
now = 1732789151
wes = 0x557cfd6fe708
__func__ = "SysLoggerMain"
#13 0x0000557cbe8f908a in postmaster_child_launch (child_type=child_type(at)entry=B_LOGGER, startup_data=startup_data(at)entry=0x0, startup_data_len=startup_data_len(at)entry=0, client_sock=client_sock(at)entry=0x0) at ./build/../src/backend/postmaster/launch_backend.c:277
pid = <optimized out>
#14 0x0000557cbe8ff9c2 in SysLogger_Start () at ./build/../src/backend/postmaster/syslogger.c:706
sysloggerPid = <optimized out>
filename = 0x557cfd75a160 "og output will go to log destination \"csvlog\".\n"
__func__ = "SysLogger_Start"
#15 0x0000557cbe8fe42e in PostmasterMain (argc=argc(at)entry=3, argv=argv(at)entry=0x557cfd6fd7a0) at ./build/../src/backend/postmaster/postmaster.c:1054
opt = <optimized out>
status = <optimized out>
userDoption = <optimized out>
listen_addr_saved = false
output_config_variable = 0x0
__func__ = "PostmasterMain"
#16 0x0000557cbe6258ec in main (argc=3, argv=0x557cfd6fd7a0) at ./build/../src/backend/main/main.c:197
do_check_root = <optimized out>

--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"Once again, thank you and all of the developers for your hard work on
PostgreSQL. This is by far the most pleasant management experience of
any database I've worked on." (Dan Harris)
http://archives.postgresql.org/pgsql-performance/2006-04/msg00247.php

Attachment Content-Type Size
0001-Fix-synchronized_standby_slots-GUC-check-hook.patch text/x-diff 1.6 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message jian he 2024-11-28 12:33:07 Re: Remove useless GROUP BY columns considering unique index
Previous Message Dean Rasheed 2024-11-28 11:45:35 Re: Adding OLD/NEW support to RETURNING