From: | Michael Paquier <michael(at)paquier(dot)xyz> |
---|---|
To: | feichanghong <feichanghong(at)qq(dot)com> |
Cc: | pgsql-bugs <pgsql-bugs(at)lists(dot)postgresql(dot)org>, andres <andres(at)anarazel(dot)de>, "sawada(dot)mshk" <sawada(dot)mshk(at)gmail(dot)com>, "horikyota(dot)ntt" <horikyota(dot)ntt(at)gmail(dot)com> |
Subject: | Re: ReplicationSlotRelease may set the statusFlags of other processes in PG14 |
Date: | 2024-03-19 03:57:51 |
Message-ID: | ZfkNP1OdgBSPPTsR@paquier.xyz |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Sat, Mar 16, 2024 at 10:29:03PM +0800, feichanghong wrote:
> A process utilizing replication slots (usually walsender) calls callback
> functions in the order of RemoveProcFromArray->ProcKill upon abnormal exit.
> Within RemoveProcFromArray, MyProc is already removed from the ProcArray.
> ProcKill then attempts to set ProcGlobal->statusFlags[MyProc->pgxactoff] again
> via ReplicationSlotRelease. By this time, the flag may already be assigned to
> another process.
Oops.
> To replicate the issue, execute the following steps:
> 1. Apply the attached v1-0000-v14-invalidate-pgxactoff-after-remove-pgproc.patch,
> where pgxactoff is set to an invalid value in ProcArrayRemove, and some
> checks are added.
> 2. Use the SQL below to terminate the walsender process.
> ```
> select pg_terminate_backend(pid) from pg_stat_activity where backend_type = 'walsender';
> ```
> # Fix
>
> To fix the issue, I have provided some patches in the attachment:
> 1. Backpatching 2f6501f into the PG14 version will fix the problem.
> 2. In PG14-head, ProcArrayRemove needs to reset pgxactoff, and some assert
> checks should be done when setting ProcGlobal->statusFlags.
Yeah, that's something that we had better fix in all stable branches.
The asserts would offer some protection moving on, but I would take
the safer move of only adding a protection like what you are
suggestion on HEAD and not in stable branches, just in case we're
missing something around them.
--
Michael
From | Date | Subject | |
---|---|---|---|
Next Message | Hayato Kuroda (Fujitsu) | 2024-03-19 04:39:30 | RE: Potential data loss due to race condition during logical replication slot creation |
Previous Message | ocean_li_996 | 2024-03-19 02:58:38 | Re:BUG #18369: logical decoding core on AssertTXNLsnOrder() |