Re: [HACKERS] Restricting maximum keep segments by repslots

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Sergei Kornilov <sk(at)zsrv(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: [HACKERS] Restricting maximum keep segments by repslots
Date: 2018-09-05 05:31:10
Message-ID: CAD21AoB-HJvL+uKsv40Gb8Dymh9uBBQUXTucqv4MDtH_AGKh4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 4, 2018 at 7:52 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> At Mon, 3 Sep 2018 18:14:22 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoBgCMc9bp2cADMFm40qoEXxbomdu1dtj5EaFSAS4BtAyw(at)mail(dot)gmail(dot)com>
>> Thank you for updating the patch!
>>
>> On Tue, Jul 31, 2018 at 6:11 PM, Kyotaro HORIGUCHI
>> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> > Hello.
>> >
>> > At Tue, 24 Jul 2018 16:47:41 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoD0rChq7wQE=_o95quopcQGjcVG9omwdH07nT5cm81hzg(at)mail(dot)gmail(dot)com>
>> >> On Mon, Jul 23, 2018 at 4:16 PM, Kyotaro HORIGUCHI
>> >> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> >> > Hello.
>> >> >
>> >> > At Fri, 20 Jul 2018 10:13:58 +0900, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote in <CAD21AoDayePWwu4t=VPP5P1QFDSBvks1d8j76bXp5rbXoPbZcA(at)mail(dot)gmail(dot)com>
>> >> This funk should be updated.
>> >
>> > Perhaps you need a fresh database cluster.
>>
>> I meant this was a doc update in 0004 patch but it's fixed in v7 patch.
>
> Wow..
>
>> While testing the v7 patch, I got the following result with
>> max_slot_wal_keep_size = 5GB and without wal_keep_segments setting.
>>
>> =# select pg_current_wal_lsn(), slot_name, restart_lsn,
>> confirmed_flush_lsn, wal_status, remain, pg_size_pretty(remain) from
>> pg_replication_slots ;
>> pg_current_wal_lsn | slot_name | restart_lsn | confirmed_flush_lsn |
>> wal_status | remain | pg_size_pretty
>> --------------------+-----------+-------------+---------------------+------------+----------+----------------
>> 2/A30000D8 | l1 | 1/AC000910 | 1/AC000948 |
>> streaming | 16777000 | 16 MB
>> (1 row)
>>
>> The actual distance between the slot limit and the slot 'l1' is about
>> 1GB(5GB - (2/A30000D8 - 1/AC000910)) but the system view says the
>> remain is only 16MB. For the calculation of resetBytes in
>> GetOldestKeepSegment(), the current patch seems to calculate the
>> distance between the minSlotLSN and restartLSN when (curLSN -
>> max_slot_wal_keep_size) < minSlotLSN. However, I think that the actual
>> remained bytes until the slot lost the required WAL is (restartLSN -
>> (currLSN - max_slot_wal_keep_size)) in that case.
>
> Oops! That's a silly thinko or rather a typo. It's apparently
> wrong that keepSegs instead of limitSegs is involved in making
> the calculation of restBytes. Just using limitSegs makes it
> sane. It's a pity that I removed the remain from regression test.
>
> Fixed that and added an item for remain calculation in the TAP
> test. I expect that pg_size_pretty() adds some robustness to the
> test.
>
>> Also, 0004 patch needs to be rebased on the current HEAD.
>
> Done. Please find the v8 attached.
>

Thank you for updating! Here is the review comment for v8 patch.

+ /*
+ * This slot still has all required segments. Calculate how many
+ * LSN bytes the slot has until it loses restart_lsn.
+ */
+ fragbytes = wal_segment_size - (currLSN % wal_segment_size);
+ XLogSegNoOffsetToRecPtr(restartSeg + limitSegs - currSeg,
fragbytes,
+ wal_segment_size, *restBytes);

For the calculation of fragbytes, I think we should calculate the
fragment bytes of restartLSN instead. The the formula "restartSeg +
limitSegs - currSeg" means the # of segment between restartLSN and the
limit by the new parameter. I don't think that the summation of it and
fragment bytes of currenLSN is correct. As the following result
(max_slot_wal_keep_size is 128MB) shows, the remain column shows the
actual remains + 16MB (get_bytes function returns the value of
max_slot_wal_keep_size in bytes).

postgres(1:29447)=# select pg_current_wal_lsn(), slot_name,
restart_lsn, wal_status, remain, pg_size_pretty(remain),
pg_size_pretty(get_bytes('max_slot_wal_keep_size') -
(pg_current_wal_lsn() - restart_lsn)) from pg_replication_slots ;
pg_current_wal_lsn | slot_name | restart_lsn | wal_status | remain
| pg_size_pretty | pg_size_pretty
--------------------+-----------+-------------+------------+-----------+----------------+----------------
0/1D0001F0 | l1 | 0/1D0001B8 | streaming | 150994448
| 144 MB | 128 MB
(1 row)

---
If the wal_keeps_segments is greater than max_slot_wal_keep_size, the
wal_keep_segments doesn't affect the value of the remain column.

postgres(1:48422)=# show max_slot_wal_keep_size ;
max_slot_wal_keep_size
------------------------
128MB
(1 row)

postgres(1:48422)=# show wal_keep_segments ;
wal_keep_segments
-------------------
5000
(1 row)

postgres(1:48422)=# select slot_name, wal_status, remain,
pg_size_pretty(remain) as remain from pg_replication_slots ;
slot_name | wal_status | remain | remain
-----------+------------+-----------+--------
l1 | streaming | 150994728 | 144 MB
(1 row)

*** After consumed over 128MB WAL ***

postgres(1:48422)=# select slot_name, wal_status, remain,
pg_size_pretty(remain) as remain from pg_replication_slots ;
slot_name | wal_status | remain | remain
-----------+------------+--------+---------
l1 | streaming | 0 | 0 bytes
(1 row)

---
For the cosmetic stuff there are code where need the line break.

static void CheckPointGuts(XLogRecPtr checkPointRedo, int flags);
+static XLogSegNo GetOldestKeepSegment(XLogRecPtr currpos, XLogRecPtr
minSlotPtr, XLogRecPtr restartLSN, uint64 *restBytes);
static void KeepLogSeg(XLogRecPtr recptr, XLogSegNo *logSegNo);
static XLogRecPtr XLogGetReplicationSlotMinimumLSN(void);

and

+static XLogSegNo
+GetOldestKeepSegment(XLogRecPtr currLSN, XLogRecPtr minSlotLSN,
XLogRecPtr restartLSN, uint64 *restBytes)
+{

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Victor Wagner 2018-09-05 06:21:30 Re: Bug fix for glibc broke freebsd build in REL_11_STABLE
Previous Message Amit Langote 2018-09-05 05:25:16 Re: speeding up planning with partitions