From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | jgdr(at)dalibo(dot)com |
Cc: | andres(at)anarazel(dot)de, michael(at)paquier(dot)xyz, sawada(dot)mshk(at)gmail(dot)com, peter(dot)eisentraut(at)2ndquadrant(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org, thomas(dot)munro(at)enterprisedb(dot)com, sk(at)zsrv(dot)org, michael(dot)paquier(at)gmail(dot)com |
Subject: | Re: [HACKERS] Restricting maximum keep segments by repslots |
Date: | 2020-01-23 12:28:54 |
Message-ID: | 20200123.212854.658794168913258596.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello, Jehan.
At Wed, 22 Jan 2020 17:47:23 +0100, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> wrote in
> Hi,
>
> First, it seems you did not reply to Alvaro's concerns in your new set of
> patch. See:
>
> https://www.postgresql.org/message-id/20190917195800.GA16694%40alvherre.pgsql
Mmmm. Thank you very much for noticing that, Jehan, and sorry for
overlooking, Alvaro.
At Tue, 17 Sep 2019 16:58:00 -0300, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote in
> suggest a substitute name, because the API itself doesn't convince me; I
> think it would be sufficient to have it return a single slot name,
> perhaps the one that is behind the most ... or maybe the one that is
> behind the least? This simplifies a lot of code (in particular you do
> away with the bunch of statics, right?), and I don't think the warning
> messages loses anything, because for details the user should really look
> into the monitoring view anyway.
Ok, I removed the fannily-named function. The message become more or
less the following. The DETAILS might not needed.
| WARNING: 2 replication slots have lost required WAL segments by 5 segments
| DETAIL: Most affected slot is s1.
> I didn't like GetLsnAvailability() returning a string either. It seems
> more reasonable to me to define a enum with possible return states, and
> have the enum value be expanded to some string in
> pg_get_replication_slots().
Agreed. Done.
> In the same function, I think that setting restBytes to -1 when
> "useless" is bad style. I would just leave that variable alone when the
> returned status is not one that receives the number of bytes. So the
> caller is only entitled to read the value if the returned enum value is
> such-and-such ("keeping" and "streaming" I think).
That is the only condition. If max_slot_wal_keep_size = -1, The value
is useless for the two states. I added that explanation to the
comment of Get(Lsn)Walavailability().
> I'm somewhat uncomfortable with the API change to GetOldestKeepSegment
> in 0002. Can't its caller do the math itself instead?
Mmm. Finally I found that I merged two calculations that have scarce
relation. You're right here. Thanks.
The attached v18 addressed all of your (Alvaro's) comments.
> On Tue, 24 Dec 2019 21:26:14 +0900 (JST)
> Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> wrote:
> > If we assume "losing" segments as "lost", a segment once "lost" can
> > return to "keeping" or "streaming" state. That is intuitively
> > impossible. On the other hand if we assume it as "keeping", it should
> > not be removed by the next checkpoint but actually it can be
> > removed. The state "losing" means such a unstable state different from
> > both "lost" and "keeping".
>
> OK, indeed.
>
> But I'm still unconfortable with this "unstable" state. It would be better if
> we could grab a stable state: either "keeping" or "lost".
I feel the same, but the being-removed WAL segments remain until
checkpoint runs and even after removal replication can continue if
walsender is reading the removed-but-already-opened file. I'll put
more thought on that.
> > In short, the "streaming/normal" state is useless if
> > max_slot_wal_keep_size < max_wal_size.
>
> Good catch!
Thanks!:)
> > Finally I used the following wordings.
> >
> > (there's no "inactive" wal_state)
> >
> > * normal: required WAL within max_wal_size when max_slot_wal_keep_size
> > is larger than max_wal_size.
> > * keeping: required segments are held by replication slots or
> > wal_keep_segments.
> >
> > * losing: required segments are about to be removed or may be already
> > removed but streaming is not dead yet.
>
> As I wrote, I'm still uncomfortable with this state. Maybe we should ask
> other reviewers opinions on this.
>
> [...]
> > > WARNING: some replication slots have lost required WAL segments
> > > DETAIL: Slot slot_limit_st lost 177 segment(s)
> > >
> > > I wonder if this is useful to show these messages for slots that were
> > > already dead before this checkpoint?
> >
> > Makes sense. I changed KeepLogSeg so that it emits the message only on
> > slot_names changes.
>
> Thanks.
>
> Bellow some code review.
Thank you for the review, I don't have a time right now but address
the below comments them soon.
> In regard with FindOldestXLogFileSegNo(...):
>
> > /*
> > * Return the oldest WAL segment file.
> > *
> > * The returned value is XLogGetLastRemovedSegno() + 1 when the function
> > * returns a valid value. Otherwise this function scans over WAL files and
> > * finds the oldest segment at the first time, which could be very slow.
> > */
> > XLogSegNo
> > FindOldestXLogFileSegNo(void)
>
> The comment is not clear to me. I suppose "at the first time" might better be
> expressed as "if none has been removed since last startup"?
>
> Moreover, what about patching XLogGetLastRemovedSegno() itself instead of
> adding a function?
>
> In regard with GetLsnAvailability(...):
>
> > /*
> > * Detect availability of the record at given targetLSN.
> > *
> > * targetLSN is restart_lsn of a slot.
>
> Wrong argument name. It is called "restart_lsn" in the function
> declaration.
>
> > * restBytes is the pointer to uint64 variable, to store the remaining bytes
> > * until the slot goes into "losing" state.
>
> I'm not convinced with this argument name. What about "remainingBytes"? Note
> that you use remaining_bytes elsewhere in your patch.
>
> > * -1 is stored to restBytes if the values is useless.
>
> What about returning a true negative value when the slot is really lost?
>
> All in all, I feel like this function is on the fence between being generic
> because of its name and being slot-only oriented because of the first parameter
> name, use of "max_slot_wal_keep_size_mb", returned status and "slotPtr".
>
> I wonder if it should be more generic and stay here or move to xlogfuncs.c with
> a more specific name?
>
> > * slot limitation is not activated, WAL files are kept unlimitedlllly
>
> "unlimitedly"? "infinitely"? "unconditionally"?
>
> > /* it is useless for the states below */
> > *restBytes = -1;
>
> This might be set to the real bytes kept, even if status is "losing".
>
> > * The segment is alrady lost or being lost. If the oldest segment is just
>
> "already"
>
> > if (oldestSeg == restartSeg + 1 && walsender_pid != 0)
> > return "losing";
>
> I wonder if this should be "oldestSeg > restartSeg"?
> Many segments can be removed by the next or running checkpoint. And a running
> walsender can send more than one segment in the meantime I suppose?
>
> In regard with GetOldestKeepSegment(...):
>
> > static XLogSegNo
> > GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN,
> > XLogRecPtr targetLSN, int64 *restBytes)
>
> I wonder if minSlotLSN is really useful as a parameter or if it should be
> fetched from GetOldestKeepSegment() itself? Currently,
> XLogGetReplicationSlotMinimumLSN() is always called right before
> GetOldestKeepSegment() just to fill this argument.
>
> > walstate =
> > GetLsnAvailability(restart_lsn, active_pid, &remaining_bytes);
>
> I agree with Alvaro: we might want to return an enum and define the related
> state string here. Or, if we accept negative remaining_bytes, GetLsnAvailability
> might even only return remaining_bytes and we deduce the state directly from
> here.
>
> Regards,
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
v18-0001-Add-WAL-relief-vent-for-replication-slots.patch | text/x-patch | 11.1 KB |
v18-0002-Add-monitoring-aid-for-max_slot_wal_keep_size.patch | text/x-patch | 21.6 KB |
v18-0003-Add-primary_slot_name-to-init_from_backup-in-TAP.patch | text/x-patch | 1.0 KB |
v18-0004-TAP-test-for-the-slot-limit-feature.patch | text/x-patch | 8.1 KB |
v18-0005-Documentation-for-slot-limit-feature.patch | text/x-patch | 5.0 KB |
v18-0006-Check-removal-of-in-reading-segment-file.patch | text/x-patch | 2.3 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kyotaro Horiguchi | 2020-01-23 12:33:25 | Re: [HACKERS] Restricting maximum keep segments by repslots |
Previous Message | Mahendra Singh Thalor | 2020-01-23 12:21:00 | Re: Error message inconsistency |