Re: [HACKERS] Restricting maximum keep segments by repslots

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, sk(at)zsrv(dot)org, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots
Date: 2018-07-05 06:43:56
Message-ID: CAD21AoDiiA4qHj0thqw80Bt=vefSQ9yGpZnr0kuLTXszbrV9iQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 4, 2018 at 5:28 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Hello.
>
> At Tue, 26 Jun 2018 16:26:59 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20180626(dot)162659(dot)223208514(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
>> The previous patche files doesn't have version number so I let
>> the attached latest version be v2.
>>
>>
>> v2-0001-Add-WAL-releaf-vent-for-replication-slots.patch
>> The body of the limiting feature
>>
>> v2-0002-Add-monitoring-aid-for-max_replication_slots.patch
>> Shows the status of WAL rataining in pg_replication_slot view
>>
>> v2-0003-TAP-test-for-the-slot-limit-feature.patch
>> TAP test for this feature
>>
>> v2-0004-Documentation-for-slot-limit-feature.patch
>> Documentation, as the name.
>
> Travis (test_decoding test) showed that GetOldestXLogFileSegNo
> added by 0002 forgets to close temporarily opened pg_wal
> directory. This is the fixed version v3.
>

Thank you for updating the patch! I looked at v3 patches. Here is
review comments.

---
+ {"max_slot_wal_keep_size", PGC_SIGHUP, REPLICATION_SENDING,
+ gettext_noop("Sets the maximum size of extra
WALs kept by replication slots."),
+ NULL,
+ GUC_UNIT_MB
+ },
+ &max_slot_wal_keep_size_mb,
+ 0, 0, INT_MAX,
+ NULL, NULL, NULL
+ },

Maybe MAX_KILOBYTES would be better instead of INT_MAX like wal_max_size.

---
Once the following WARNING emitted this message is emitted whenever we
execute CHECKPOINT even if we don't change anything. Is that expected
behavior? I think it would be better to emit this message only when we
remove WAL segements that are required by slots.

WARNING: some replication slots have lost required WAL segments
DETAIL: The mostly affected slot has lost 153 segments.

---
+ Assert(wal_keep_segments >= 0);
+ Assert(max_slot_wal_keep_size_mb >= 0);

These assertions are not meaningful because these parameters are
ensured >= 0 by those definition.

---
+ /* slots aren't useful, consider only wal_keep_segments */
+ if (slotpos == InvalidXLogRecPtr)
+ {

Perhaps XLogRecPtrIsInvalid(slotpos) is better.

---
+ if (slotpos != InvalidXLogRecPtr && currSeg <= slotSeg + wal_keep_segments)
+ slotpos = InvalidXLogRecPtr;
+
+ /* slots aren't useful, consider only wal_keep_segments */
+ if (slotpos == InvalidXLogRecPtr)
+ {

This logic is hard to read to me. The slotpos can be any of: valid,
valid but then become invalid in halfway or invalid from beginning of
this function. Can we convert this logic to following?

if (XLogRecPtrIsInvalid(slotpos) ||
currSeg <= slotSeg + wal_keep_segments)

---
+ keepSegs = wal_keep_segments +
+ ConvertToXSegs(max_slot_wal_keep_size_mb, wal_segment_size);

Why do we need to keep (wal_keep_segment + max_slot_wal_keep_size) WAL
segments? I think what this feature does is, if wal_keep_segments is
not useful (that is, slotSeg < (currSeg - wal_keep_segment) then we
normally choose slotSeg as lower boundary but max_slot_wal_keep_size
restrict the lower boundary so that it doesn't get lower than the
threshold. So I thought what this function should do is to calculate
min(currSeg - wal_keep_segment, max(currSeg - max_slot_wal_keep_size,
slotSeg)), I might be missing something though.

---
+ SpinLockAcquire(&XLogCtl->info_lck);
+ oldestSeg = XLogCtl->lastRemovedSegNo;
+ SpinLockRelease(&XLogCtl->info_lck);

We can use XLogGetLastRemovedSegno() instead.

---
+ xldir = AllocateDir(XLOGDIR);
+ if (xldir == NULL)
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not open write-ahead log directory \"%s\": %m",
+ XLOGDIR)));

Looking at other code allocating a directory we don't check xldir ==
NULL because it will be detected by ReadDir() function and raise an
error in that function. So maybe we don't need to check it just after
allocation.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2018-07-05 07:11:53 Re: [HACKERS] WAL logging problem in 9.4.3?
Previous Message Kato, Sho 2018-07-05 06:39:05 RE: Speeding up INSERTs and UPDATEs to partitioned tables