From: | Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> |
---|---|
To: | peter(dot)eisentraut(at)2ndquadrant(dot)com |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Restricting maximum keep segments by repslots |
Date: | 2017-09-07 12:59:56 |
Message-ID: | 20170907.215956.110216588.horiguchi.kyotaro@lab.ntt.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello,
At Thu, 07 Sep 2017 14:12:12 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170907(dot)141212(dot)227032666(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > I would like a flag in pg_replication_slots, and possibly also a
> > numerical column that indicates how far away from the critical point
> > each slot is. That would be great for a monitoring system.
>
> Great! I'll do that right now.
Done.
In the attached patch on top of the previous patch, I added two
columns in pg_replication_slots, "live" and "distance". The first
indicates the slot will "live" after the next checkpoint. The
second shows the how many bytes checkpoint lsn can advance before
the slot will "die", or how many bytes the slot have lost after
"death".
Setting wal_keep_segments = 1 and max_slot_wal_keep_size = 16MB.
=# select slot_name, restart_lsn, pg_current_wal_lsn(), live, distance from pg_replication_slots;
slot_name | restart_lsn | pg_current_wal_lsn | live | distance
-----------+-------------+--------------------+------+-----------
s1 | 0/162D388 | 0/162D3C0 | t | 0/29D2CE8
This shows that checkpoint can advance 0x29d2ce8 bytes before the
slot will die even if the connection stalls.
s1 | 0/4001180 | 0/6FFF2B8 | t | 0/DB8
Just before the slot loses sync.
s1 | 0/4001180 | 0/70008A8 | f | 0/FFEE80
The checkpoint after this removes some required segments.
2017-09-07 19:04:07.677 JST [13720] WARNING: restart LSN of replication slots is ignored by checkpoint
2017-09-07 19:04:07.677 JST [13720] DETAIL: Some replication slots have lost required WAL segnents to continue by up to 1 segments.
If max_slot_wal_keep_size if not set (0), live is always true and
distance is NULL.
slot_name | restart_lsn | pg_current_wal_lsn | live | distance
-----------+-------------+--------------------+------+-----------
s1 | 0/4001180 | 0/73117A8 | t |
- The name (or its content) of the new columns should be arguable.
- pg_replication_slots view takes LWLock on ControlFile and
spinlock on XLogCtl for every slot. But seems difficult to
reduce it..
- distance seems mitakenly becomes 0/0 for certain condition..
- The result seems almost right but more precise check needed.
(Anyway it cannot be perfectly exact.);
regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
0002-Add-monitoring-aid-for-max_replication_slots.patch | text/x-patch | 7.2 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Alexey Chernyshov | 2017-09-07 13:18:37 | Re: index-only count(*) for indexes supporting bitmap scans |
Previous Message | Ashutosh Bapat | 2017-09-07 12:57:34 | Re: Adding support for Default partition in partitioning |