From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | masahiko(dot)sawada(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Race condition in SyncRepGetSyncStandbysPriority |
Date: | 2020-04-16 17:20:04 |
Message-ID: | 43ecec68-abff-6cdb-8412-204451d650ac@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2020/04/14 22:52, Tom Lane wrote:
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
>> SyncRepGetSyncStandbysPriority() is runing holding SyncRepLock so
>> sync_standby_priority of any walsender can be changed while the
>> function is scanning welsenders. The issue is we already have
>> inconsistent walsender information before we enter the function. Thus
>> how many times we scan on the array doesn't make any difference.
>
> *Yes it does*. The existing code can deliver entirely broken results
> if some walsender exits between where we examine the priorities and
> where we fetch the WAL pointers.
So, in this case, the oldest lsn that SyncRepGetOldestSyncRecPtr()
calculates may be based on also the lsn of already-exited walsender.
This is what you say "broken results"? If yes, ISTM that this issue still
remains even after applying your patch. No? The walsender marked
as sync may still exit just before SyncRepGetOldestSyncRecPtr()
calculates the oldest lsn.
IMO that the broken results can be delivered when walsender marked
as sync exits *and* new walsender comes at that moment. If this new
walsender uses the WalSnd slot that the exited walsender used,
SyncRepGetOldestSyncRecPtr() wronly calculates the oldest lsn based
on this new walsender (i.e., different walsender from one marked as sync).
If this is actually what you tried to say "broken results", your patch
seems fine and fixes the issue.
BTW, since the patch changes the API of SyncRepGetSyncStandbys(),
it should not be back-patched to avoid ABI break. Right?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2020-04-16 17:34:39 | Re: fixing old_snapshot_threshold's time->xid mapping |
Previous Message | Andres Freund | 2020-04-16 17:14:41 | Re: fixing old_snapshot_threshold's time->xid mapping |