From: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Review for GetWALAvailability() |
Date: | 2020-06-12 16:38:49 |
Message-ID: | 8d55969f-ba37-b89a-6494-e9322ccdb35d@oss.nttdata.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
The document explains that "lost" value that
pg_replication_slots.wal_status reports means
some WAL files are definitely lost and this slot cannot be used to resume replication anymore.
However, I observed "lost" value while inserting lots of records,
but replication could continue normally. So I wonder if
pg_replication_slots.wal_status may have a bug.
wal_status is calculated in GetWALAvailability(), and probably I found
some issues in it.
keepSegs = ConvertToXSegs(Max(max_wal_size_mb, wal_keep_segments),
wal_segment_size) + 1;
max_wal_size_mb is the number of megabytes. wal_keep_segments is
the number of WAL segment files. So it's strange to calculate max of them.
The above should be the following?
Max(ConvertToXSegs(max_wal_size_mb, wal_segment_size), wal_keep_segments) + 1
if ((max_slot_wal_keep_size_mb <= 0 ||
max_slot_wal_keep_size_mb >= max_wal_size_mb) &&
oldestSegMaxWalSize <= targetSeg)
return WALAVAIL_NORMAL;
This code means that wal_status reports "normal" only when
max_slot_wal_keep_size is negative or larger than max_wal_size.
Why is this condition necessary? The document explains "normal
means that the claimed files are within max_wal_size". So whatever
max_slot_wal_keep_size value is, IMO that "normal" should be
reported if the WAL files claimed by the slot are within max_wal_size.
Thought?
Or, if that condition is really necessary, the document should be
updated so that the note about the condition is added.
If the WAL files claimed by the slot exceeds max_slot_wal_keep_size
but any those WAL files have not been removed yet, wal_status seems
to report "lost". Is this expected behavior? Per the meaning of "lost"
described in the document, "lost" should be reported only when
any claimed files are removed, I think. Thought?
Or this behavior is expected and the document is incorrect?
BTW, if we want to implement GetWALAvailability() as the document
advertises, we can simplify it like the attached POC patch.
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
Attachment | Content-Type | Size |
---|---|---|
simple_getwalavailability_wip.patch | text/plain | 2.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Kenichiro Tanaka | 2020-06-12 16:40:42 | Re: Wrong width of UNION statement |
Previous Message | Emre Hasegeli | 2020-06-12 15:34:15 | Re: exp() versus the POSIX standard |