From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, alvherre(at)2ndquadrant(dot)com |
Cc: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Review for GetWALAvailability() |
Date: | 2020-06-17 08:01:11 |
Message-ID: | f84972e2-f4ca-4079-4eba-0187e6c904c2@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2020/06/17 12:10, Kyotaro Horiguchi wrote:
> At Tue, 16 Jun 2020 22:40:56 -0400, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
>> On 2020-Jun-17, Fujii Masao wrote:
>>> On 2020/06/17 3:50, Alvaro Herrera wrote:
>>
>>> So InvalidateObsoleteReplicationSlots() can terminate normal backends.
>>> But do we want to do this? If we want, we should add the note about this
>>> case into the docs? Otherwise the users would be surprised at termination
>>> of backends by max_slot_wal_keep_size. I guess that it's basically rarely
>>> happen, though.
>>
>> Well, if we could distinguish a walsender from a non-walsender process,
>> then maybe it would make sense to leave backends alive. But do we want
>> that? I admit I don't know what would be the reason to have a
>> non-walsender process with an active slot, so I don't have a good
>> opinion on what to do in this case.
>
> The non-walsender backend is actually doing replication work. It
> rather should be killed?
I have no better opinion about this. So I agree to leave the logic as it is
at least for now, i.e., we terminate the process owning the slot whatever
the type of process is.
>
>>>>> + /*
>>>>> + * Signal to terminate the process using the replication slot.
>>>>> + *
>>>>> + * Try to signal every 100ms until it succeeds.
>>>>> + */
>>>>> + if (!killed && kill(active_pid, SIGTERM) == 0)
>>>>> + killed = true;
>>>>> + ConditionVariableTimedSleep(&slot->active_cv, 100,
>>>>> + WAIT_EVENT_REPLICATION_SLOT_DROP);
>>>>> + } while (ReplicationSlotIsActive(slot, NULL));
>>>>
>>>> Note that here you're signalling only once and then sleeping many times
>>>> in increments of 100ms -- you're not signalling every 100ms as the
>>>> comment claims -- unless the signal fails, but you don't really expect
>>>> that. On the contrary, I'd claim that the logic is reversed: if the
>>>> signal fails, *then* you should stop signalling.
>>>
>>> You mean; in this code path, signaling fails only when the target process
>>> disappears just before signaling. So if it fails, slot->active_pid is
>>> expected to become 0 even without signaling more. Right?
>>
>> I guess kill() can also fail if the PID now belongs to a process owned
>> by a different user.
Yes. This case means that the PostgreSQL process using the slot disappeared
and the same PID was assigned to non-PostgreSQL process. So if kill() fails
for this reason, we don't need to kill() again.
> I think we've disregarded very quick reuse of
>> PIDs, so we needn't concern ourselves with it.
>
> The first time call to ConditionVariableTimedSleep doen't actually
> sleep, so the loop works as expected. But we may make an extra call
> to kill(2). Calling ConditionVariablePrepareToSleep beforehand of the
> loop would make it better.
Sorry I failed to understand your point...
Anyway, the attached is the updated version of the patch. This fixes
all the issues in InvalidateObsoleteReplicationSlots() that I reported
upthread.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachment | Content-Type | Size |
---|---|---|
invalidate_obsolete_replication_slots_v2.patch | text/plain | 8.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2020-06-17 08:02:31 | Re: pg_regress cleans up tablespace twice. |
Previous Message | Michael Paquier | 2020-06-17 07:51:10 | Re: Remove dead forceSync parameter of XactLogCommitRecord() |