From: | Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com> |
---|---|
To: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Race condition in SyncRepGetSyncStandbysPriority |
Date: | 2020-04-17 08:03:11 |
Message-ID: | CA+fd4k5n=bF95XDWw9aJHHRDgN8xFtmUqYNXo6Hk47i+SDDeTw@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 17 Apr 2020 at 14:58, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
>
> At Thu, 16 Apr 2020 11:39:06 -0400, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote in
> > Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> writes:
> > > [ syncrep-fixes-4.patch ]
> >
> > I agree that we could probably improve the clarity of this code with
> > further rewriting, but I'm still very opposed to the idea of having
> > callers know that the first num_sync array elements are the active
> > ones. It's wrong (or at least different from current behavior) for
> > quorum mode, where there might be more than num_sync walsenders to
> > consider. And it might not generalize very well to other syncrep
> > selection rules we might add in future, which might also not have
> > exactly num_sync interesting walsenders. So I much prefer an API
> > definition that uses bool flags in an array that has no particular
> > ordering (so far as the callers know, anyway). If you don't like
> > is_sync_standby, how about some more-neutral name like is_active
> > or is_interesting or include_position?
>
> I'm convinced that each element has is_sync_standby. I agree to the
> name is_sync_standby since I don't come up with a better name.
>
> > I dislike the proposed comment revisions in SyncRepReleaseWaiters,
> > too, particularly the change to say that what we're "announcing"
> > is the ability to release waiters. You did not change the actual
> > log messages, and you would have gotten a lot of pushback if
> > you tried, because the current messages make sense to users and
> > something like that would not. But by the same token this new
> > comment isn't too helpful to somebody reading the code.
>
> The current log messages look perfect to me. I don't insist on the
> comment change since I might take the definition of "sync standby" too
> strictly.
>
> > (Actually, I wonder why we even have the restriction that only
> > sync standbys can release waiters. It's not like they are
> > going to get different results from SyncRepGetSyncRecPtr than
> > any other walsender would. Maybe we should just drop all the
> > am_sync logic?)
>
> I thought the same thing, though I didn't do that in the last patch.
>
> am_sync seems intending to reduce spurious wakeups but actually
> spurious wakeup won't increase even without it. Thus the only
> remaining task of am_sync is the trigger for the log messages and that
> fact is the sign that the log messages should be emitted within
> SyncRepGetSyncRecPtr. That eliminates references to SyncRepConfig in
> SyncRepReleaseWaiters, which make me feel ease.
>
> The attached is baed on syncrep-fixes-1.patch + am_sync elimination.
>
Just for confirmation, since the new approach doesn't change that
walsenders reload new config at their convenient timing, it still can
happen that a walsender releases waiters according to the old config
that defines fewer number of sync standbys, during walsenders
absorbing a change in the set of synchronous walsenders. In the worst
case where the master crashes in the middle, we cannot be sure how
many sync servers the data has been replicated to. Is that right?
Regards,
--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2020-04-17 08:06:29 | Re: 001_rep_changes.pl stalls |
Previous Message | Kyotaro Horiguchi | 2020-04-17 08:00:15 | Re: 001_rep_changes.pl stalls |