From: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
Cc: | Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Gabriele Bartolini <gabriele(dot)bartolini(at)enterprisedb(dot)com> |
Subject: | Re: crash with synchronized_standby_slots |
Date: | 2024-12-03 17:04:49 |
Message-ID: | 202412031704.thc75krypvts@alvherre.pgsql |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2024-Nov-29, Amit Kapila wrote:
> I tried it on my Windows machine and noticed that ReplicationSlotCtl
> is NULL for syslogger, so the problem doesn't occur. The reason is
> that we don't attach to shared memory in syslogger, so ideally
> ReplicationSlotCtl should be NULL. Because we inherit everything
> through the fork for Linux systems and then later for processes that
> don't want to attach to shared memory, we call PGSharedMemoryDetach()
> from postmaster_child_launch(). The PGSharedMemoryDetach won't
> reinitialize the memory pointed to by ReplicationSlotCtl, so, it would
> be an invalid memory.
Heh, interesting. I'm not sure if we should try to do something about
invalid pointers being left around after shmem initialization. Also, is
this the first GUC check_hook that needs to take an LWLock?
Anyway, I have pushed this.
BTW it occurs to me that there might well be some sort of thundering
herd problem if every process needs to run the check_hook when a SIGHUP
is broadcast, and they'll all be waiting on that particular lwlock and
run the same validation locally again and again. I bet if you have a
few thousand backends (hi Jakub! [1]) it's problematic. Maybe we need a
different way to validate the GUC, but I don't know what that might be;
but doing the validation once and storing the result in shmem might be
better.
On 2024-Nov-29, Zhijie Hou (Fujitsu) wrote:
> I can also reproduce this bug and confirmed that the bug is fixed
> after applying the patch. In addition to the regression tests, I also
> manually tested the behavior of the postmaster, walsender, and user
> backend after reloading the configuration, and they all work as
> expected.
Many thanks for testing!
[1] https://postgr.es/m/CAKZiRmwrBjCbCJ433wV5zjvwt_OuY7BsVX12MBKiBu+eNZDm6g@mail.gmail.com
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
"I apologize for the confusion in my previous responses.
There appears to be an error." (ChatGPT)
From | Date | Subject | |
---|---|---|---|
Next Message | Nathan Bossart | 2024-12-03 17:05:34 | Re: Better error message when --single is not the first arg to postgres executable |
Previous Message | Robert Haas | 2024-12-03 16:41:22 | Re: code contributions for 2024, WIP version |