From: | Bruce Momjian <bruce(at)momjian(dot)us> |
---|---|
To: | Andres Freund <andres(at)anarazel(dot)de> |
Cc: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Subject: | Re: pg_walfile_name_offset can return inconsistent values |
Date: | 2023-11-13 17:14:57 |
Message-ID: | ZVJZkU9O98Jbydml@momjian.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Nov 10, 2023 at 07:59:43PM -0800, Andres Freund wrote:
> Hi,
>
> On 2023-11-09 16:14:07 -0500, Bruce Momjian wrote:
> > On Thu, Nov 9, 2023 at 09:49:48PM +0100, Matthias van de Meent wrote:
> > > Either way, I think fix #1 is most correct (as was attached in
> > > offset2.diff, and quoted verbatim here), because that has no chance of
> > > having surprising underflowing behaviour when you use '0/0'::lsn as
> > > input.
> >
> > Attached is the full patch that changes pg_walfile_name_offset() and
> > pg_walfile_name(). There is no need for doc changes.
>
> I think this needs to add tests "documenting" the changed behaviour and
> perhaps also for a few other edge cases. You could e.g. test
> SELECT * FROM pg_walfile_name_offset('0/0')
> which today returns bogus values, and which is independent of the wal segment
> size.
>
> And with
> SELECT setting::int8 AS segment_size FROM pg_settings WHERE name = 'wal_segment_size' \gset
> you can test real things too, e.g.:
> SELECT segment_number, file_offset FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size), pg_split_walfile_name(file_name);
> SELECT segment_number, file_offset FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size + 1), pg_split_walfile_name(file_name);
> SELECT segment_number, file_offset = :segment_size - 1 FROM pg_walfile_name_offset('0/0'::pg_lsn + :segment_size - 1), pg_split_walfile_name(file_name);
Sure, I have added these tests.
FYI, pg_walfile_name_offset() has this C comment, which I have removed
in this patch;
* Note that a location exactly at a segment boundary is taken to be in
* the previous segment. This is usually the right thing, since the
* expected usage is to determine which xlog file(s) are ready to archive.
There is also a documentation mention of this behavior:
When the given write-ahead log location is exactly at a write-ahead log file boundary, both
these functions return the name of the preceding write-ahead log file.
This is usually the desired behavior for managing write-ahead log archiving
behavior, since the preceding file is the last one that currently
needs to be archived.
After seeing the doc mention, I started digging into the history of this
feature. It was originally called pg_current_xlogfile_offset() and
proposed in this email thread, which started on 2006-07-31:
https://www.postgresql.org/message-id/flat/1154384790.3226.21.camel%40localhost.localdomain
In the initial patch by Simon Riggs, there was no "previous segment
file" behavior, just a simple filename/offset calculation.
This was applied on 2006-08-06 with this commit:
commit 704ddaaa09
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Sun Aug 6 03:53:44 2006 +0000
Add support for forcing a switch to a new xlog file; cause such a switch
to happen automatically during pg_stop_backup(). Add some functions for
interrogating the current xlog insertion point and for easily extracting
WAL filenames from the hex WAL locations displayed by pg_stop_backup
and friends. Simon Riggs with some editorialization by Tom Lane.
and the email of the commit said:
https://www.postgresql.org/message-id/18457.1154836638%40sss.pgh.pa.us
I also made the new user-level functions a bit more orthogonal, so
that filenames could be extracted from the existing functions like
pg_stop_backup.
There is later talk about returning last write pointer vs. the current
insert pointer, and having it match what is returned by pg_stop_backup():
https://www.postgresql.org/message-id/1155124565.2368.95.camel%40localhost.localdomain
Methinks it should be the Write pointer all of the time, since I can't
think of a valid reason for wanting to know where the Insert pointer is
*before* we've written to the xlog file. Having it be the Insert pointer
could lead to some errors.
and I suspect that it was the desire to return the last write pointer
that caused the function to return the previous segment on a boundary
offset. This was intended to simplify log shipping implementations, I
think.
The function eventually was renamed in the xlog-to-wal renaming and moved
from xlog.c to xlogfuncs.c. This thread in 2022 mentioned the
inconsistency for 0/0, but didn't seem to talk about the inconsistency
of offset vs file name:
and it concluded with:
Yes, its the deliberate choice of design, or a kind of
questionable-but-unoverturnable decision. I think there are many
external tools conscious of this behavior.
However, with the report about the inconsistency, the attached patch
fixes the behavior and removes the documentation about the odd behavior.
This will need to be mentioned as an incompatibility in the major
version release notes.
--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com
Only you can decide what is important to you.
Attachment | Content-Type | Size |
---|---|---|
offset1.diff | text/x-diff | 5.6 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2023-11-13 17:21:34 | Re: Is this a problem in GenericXLogFinish()? |
Previous Message | Joe Conway | 2023-11-13 16:57:56 | Re: How to solve the problem of one backend process crashing and causing other processes to restart? |